-
h2ibot
JAABot edited List of websites excluded from the Wayback Machine (+0):
wiki.archiveteam.org/?diff=50315&oldid=50314
-
GhostyTongue
so do i send the url here or #archiveteam ?
-
pabs
here
-
GhostyTongue
ok whats the channel commands
-
pabs
just post what you want archived and the reason for archiving here and an admin will take care of it
-
GhostyTongue
sweet ok
-
GhostyTongue
-
GhostyTongue
-
GhostyTongue
They both are under a google storage bucket
-
GhostyTongue
and their was another but that one closed a few months ago
-
GhostyTongue
and im pretty worried for those
-
pokechu22
go-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io just redirects to disney.com if I ignore a certificate error
-
pokechu22
is there more to it?
-
GhostyTongue
pokechu22 use gsutil
-
pokechu22
Ah, so it's possible to generate a lit of valid URLs for each bucket? I've only seen similar stuff with amazon AWS where you get an XML file when loading it, but I guess this is different
-
JAA
dolimg returns an HTTP 200 with 'Oops! We're sorry, but we're having technical problems.' for me.
-
pokechu22
Yeah, same for me
-
pokechu22
I'm taking a look at
cloud.google.com/storage/docs/gsutil - first time I've heard of this
-
GhostyTongue
they both work when using gsutil
-
JAA
Doesn't that require auth?
-
GhostyTongue
nope
-
GhostyTongue
im pretty sure its not configured
-
pokechu22
ok, `apt install gsutil` gives me GrandStream BudgeTone phone backup, restore and reboot utility -
pkts.ca/gsutil.shtml which is clearly not the right thing
-
JAA
It's in PyPI it seems.
-
JAA
Although the official docs don't seem to mention this, or at least not prominently if it is somewhere. They instead try to make you install an entire set of CLIs.
-
GhostyTongue
-
GhostyTongue
theirs the install instructions it seems to be lise aws cli but for google storage buckets
-
JAA
Wow, gsutil's code is disgusting. In the first few lines I'm reading, I'm already seeing `importlib` and `sys.path` madness.
-
GhostyTongue
ew
-
pokechu22
Huh, and `gsutil ls gs://dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io/` lists stuff in the root directory (though it doesn't recurse over it)
-
GhostyTongue
it should be -r
-
GhostyTongue
and -h
-
GhostyTongue
and also -l if im not wrong
-
pokechu22
Ah, yep
-
GhostyTongue
yeah
-
GhostyTongue
both cdns combinded is 340+ gb of data
-
pokechu22
That should be doable for archivebot as long as we pick a pipeline with enough space
-
GhostyTongue
cool
-
pokechu22
I'm currently doing a listing for gs://dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io - thanks for bringing this up
-
GhostyTongue
:)
-
pokechu22
I am curious where gsutil is getting the data from - it'd be nice to also save all of those metadata URLs
-
GhostyTongue
🤷
-
JAA
The core seems to be a 2000-line wrapper around boto.
-
GhostyTongue
relly
-
JAA
-
GhostyTongue
whats the job id for the amazon buckets of jumpstart games?
-
JAA
That didn't run through ArchiveBot because it required deduplication.
-
GhostyTongue
meaning?
-
pokechu22
Interesting, it seems like most of the files on dolimg are dated to 2018?
-
JAA
Meaning there is no job ID.
-
GhostyTongue
yeah my group of friends who discovered it assumed when ever they remove or add a file all file dates gets modifed
-
GhostyTongue
either that or its a google cloud update
-
pokechu22
e.g. I see 129304 2018-02-16T20:20:01Z gs://dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io/en-US/blogs/wp-content/uploads/2013/03/thomas_pocahontas_imaginary_boyfriends.jpg - that file is clearly from 2013 but is dated 2018 still
-
JAA
Perhaps they migrated to GCP in 2018?
-
GhostyTongue
its either google or disney that did that
-
GhostyTongue
I agree JAA'
-
GhostyTongue
i do know theirs a file in the cdn that should not be archive it includes ftp logins
-
GhostyTongue
and two ftps do indeed work
-
GhostyTongue
and i dont want anyone messing with the ftps
-
pokechu22
TOTAL: 1508969 objects, 280062417599 bytes (260.83 GiB) for dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io
-
pokechu22
Are the URLs used by this CDN also used on other sites? Like, would a site embed dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io/en-US/blogs/wp-content/uploads/2013/03/thomas_pocahontas_imaginary_boyfriends.jpg directly? Or is it a duplicate of a file on a different site?
-
GhostyTongue
that was quick damn
-
pokechu22
Not that that's really a problem
-
pokechu22
To be clear: that's the time to list files, not download them :)
-
GhostyTongue
i honestly have no clue
-
GhostyTongue
i also really enjoy a look thru into /worldofcars/ within dolimg
-
GhostyTongue
and i did archived it
-
GhostyTongue
-
pokechu22
TOTAL: 509858 objects, 48022996845 bytes (44.72 GiB) for go-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io. I'm going to run that as http instead of https since they both seem to be the same content, but https has a bad cert
-
GhostyTongue
cool
-
GhostyTongue
ok
-
JAA
An ArchiveBot job for Yandex.Q has been running since late June, but the site is very obviously too large for AB. There are pretty strict rate limits which trigger captcha page redirects.
-
JAA
Probably needs the DPoS treatment to get anywhere near decent coverage.
-
pokechu22
Worth noting that archivebot also extracts TONS of junk URLs on Yandex, which are ignored on that job but don't help with archivebot's performance
-
JAA
Hard to say how quickly we need to move here. I'd assume it's not extremely urgent, but whenever there's room for getting a project going, this would be good to do I think.
-
JAA
Cc arkiver OrIdow6 ^
-
flashfire42
So I presume we arent doing trove? I would like to push a project for that too considering how large an archive it is
-
h2ibot
JustAnotherArchivist edited Deathwatch (+178, /* 2023 */ Add Яндекс Кью aka Yandex.Q):
wiki.archiveteam.org/?diff=50316&oldid=50310
-
JAA
Agreed on Trove!
-
OrIdow6
On Yandex, I could do it, though Egloos + Wysp and then Google Drive come first
-
themadpro
It's not the most important site on the web by a long-shot but any chance we can get an archivebot job on
koopatv.org ?
-
themadpro
It's a parody blog based on mario characters running since 2013 that recently closed (marked as archived)
-
Barto
themadpro: i'll see what i can do, thanks
-
qyxojzh|m
Hey; could use some info on how do I properly report website shutdown announcements over on the other room
-
joepie91|m
qyxojzh: you can just post about it, just try to keep it compact with all the important details put together
-
qyxojzh|m
Sorry for editing so much, I kept forgetting info
-
joepie91|m
wait, did you post in the other room? I'm not seeing it
-
OrIdow6
I maintain that the "no more than 5 words per year in #at!!!" often goes too hard, especially to newcomers
-
qyxojzh|m
Pretty sure I did
-
qyxojzh|m
OrIdow6: Pardon?
-
OrIdow6
We should not have people thinking that this is a formal affair
-
qyxojzh|m
(I must say I'm new, sorry)
-
OrIdow6
qyxojzh|m: Not talking about you directly
-
OrIdow6
Saying that we should stop telling people not to post too much in #archiveteam
-
qyxojzh|m
I guess that's the continuation of a previous convo?
-
OrIdow6
No
-
qyxojzh|m
Oh now I see what you mean
-
qyxojzh|m
Didn't connect the dots earlier… XD
-
OrIdow6
:)
-
OrIdow6
Anyhow yeah, it's not loading for me, could be something based on my location/location on the Internet maybe
-
qyxojzh|m
OrIdow6: Sounds about right, because _I_ can access it
-
qyxojzh|m
would a VPN help?
-
OrIdow6
If you have sources on its being shut down qyxojzh|m those would be appreciated
-
qyxojzh|m
OrIdow6: Yeah good idea, excuse me
-
OrIdow6
There are probably other people for whom it loads correctly, I don't want to bother with that
-
qyxojzh|m
OrIdow6: Turns out it was fake; it's not being shut down at all… Although it would still be a good idea to back the thing up, who knows
-
qyxojzh|m
One of the teachers at the uni I am enrolled in shared the “news” with me and with others in the group I am in
-
OrIdow6
On the topic of being a good thing to back up anyway, we were discussing backing up the online version of the national library of Australia a few days ago, so yeah, definitely
-
qyxojzh|m
qyxojzh|m: Can I keep the link there even if it's not being shut down?
-
OrIdow6
In #archiveteam? Yeah it's fine - FWIW I'm (as well as many other people here) using IRC, not Matrix, so I don't even see deletions
-
spirit
yay console.hetzner.cloud/ is down
-
spirit
or it is me
-
spirit
nvm
-
h2ibot
Bzc6p edited News+C/hu (-192, HVG 2021 done; update on video archiving):
wiki.archiveteam.org/?diff=50317&oldid=49781
-
VickoSaviour
What do you think; in the near future, do you have a plan to preserve MEGA, if it happens to be shut down?
-
TheTechRobo
VickoSaviour: I don’t think much has been done in that regard but I could be weong
-
TheTechRobo
Why, is it shutting down soon?
-
TheTechRobo
Wrong*
-
VickoSaviour
well, i don't think it has enough proofs to be considered endangered, so it is safe.
-
masterx244|m
<VickoSaviour> "What do you think; in the near..." <- its also a PITA to archive, javascript fuckery at its finest that doesnt play back in the WBM
-
FavoritoHJS
appears google is going to delete accounts unused for 2 years. this won't affect much as basically anything will reset the timer, and having done anything on youtube will mean your account will never be garbage collected, so the only thing this should affect are google drive links.
-
FavoritoHJS
tbf that ship already sailed when old shares got made private-only, though.
-
qyxojzh|m
<FavoritoHJS> "appears google is going to..." <- Last I checked it's for after the death of a Google user
-
JAA
-
JAA
Starting no sooner than December, they'll purge accounts that have been inactive for 2 years because security.
-
qyxojzh|m
Not surprised tbh
-
joepie91|m
"security"
-
qyxojzh|m
For their financial security, certainly
-
flashfire42
karayou.com is still up tho the spam is starting to creep in lol
-
fireonlive
'security'
-
fireonlive
google has gotten very 'holy shit how much space are we using?' lately it seems
-
qyxojzh|m
flashfire42: what's that?
-
fireonlive
and has turned up the purge dials
-
JAA
SeCuRitY
-
fireonlive
:D
-
joepie91|m
fireonlive: wonder if it's anything to do with that fire in Paris
-
pokechu22
karayou was supposed to shut down... like january 1st or something?
-
fireonlive
hmmmm
-
joepie91|m
"wait, how many HDDs did we have to replace??"
-
flashfire42
Its a site mentioned on the deathwatch on the archiveteam wiki qyxojzh|m which was supposed to shut down at the start of this year but obviously someone forgot to flip a switch and its now starting to get spammers
-
fireonlive
lolol
-
qyxojzh|m
flashfire42: Deathwatch, what a name
-
pokechu22
Yeah, 2023年 (year) 1月 (1 month = January) 1日 (1 day = 1st)
-
qyxojzh|m
So it's a Japanese website?
-
JAA
-
flashfire42
Also looks like Kaotic found a buyer the video saying its going down is gone so
-
pokechu22
Chinese, but same characters
-
pokechu22
also has a fairly large porn section, though not all of it is porn, just keep that in mind if you browse it
-
pokechu22
especially if you can't read the subforum names :P
-
qyxojzh|m
pokechu22: I _just_ might be able to help a little then
-
JAA
Where? So I can avoid it? Asking for a friend (it's fireonlive).
-
pokechu22
We did save the entirety of it before January 1st 2023
-
pokechu22
It's the last group of subforums if I recall correctly
-
qyxojzh|m
qyxojzh|m: because 我明白一点儿中文
-
JAA
That section has an '18' in the name, so sounds about right.
-
qyxojzh|m
sorry if that's a brag XD
-
pokechu22
成人專區(未滿18請勿進入) - the 18 specifically
-
fireonlive
x3
-
JAA
Anyway, yeah, I archived all threads in late December.
-
qyxojzh|m
Yes, “do not come in if you're under eighteen” at the end
-
qyxojzh|m
Kudos to JAA
-
fireonlive
🔞
-
flashfire42
uktrainsim.com closes in a month and downloads are behind a login wall. I just put the site into archivebot but I dont think it will grab the downloads in retrospect
-
flashfire42