-
thuban
friends, considering that google drive is also at risk from google's inactive account purge, is #googlecrash functional enough to be revived on a submitted-items basis?
-
thuban
(alternatively, can archivebot be made to work with google drive links? i thought not--but some remarks on the wiki suggest otherwise--but they're old)
-
JAA
AB can sort of grab individual files but not their metadata and I believe WBM playback sucks. Also, it doesn't work for some files, e.g. large ones that trigger a prompt due to Google not running a malware scan on them.
-
JAA
But yeah, getting #googlecrash back running would be great.
-
pokechu22
AB does work fine for larger files, I'm pretty sure
-
pokechu22
but on the other hand, I've seen AB get rate-limited once when downloading too many files (I don't remember the exact parameters that led to it though - I think it was related to the learnwithportals.com AB job?)
-
pokechu22
For larger files the uc URL does show a prompt, but it's a link to another URL with a query parameter and AB handles that properly (I remember save page now having issues with that in the past though). I think you previously said it was implemented differently in the past and didn't work then though
-
pokechu22
but I've definitely done files on the order of a few hundred megabytes which definitely give that scanning prompt and they successfully save (which can be seen in the size on the AB dashboard)
-
Vokun
I'll sort through my urls again. I know Google drive creates drive.google.com, docs.google.com, and sites.google.com links. Are there any others?
-
Vokun
jamboard.google.com, script.google.com
-
pokechu22
What I'm talking about is only for files on drive.google.com - docs.google.com is different and much more scripty, and I don't know how it can be handled; sites.google.com is static and is perfectly fine for AB. I don't know anything about the other two
-
JAA
pokechu22: Ah, but you need to know the verification token to find it again in the WBM, I guess.
-
thuban
docs is discussed at
wiki.archiveteam.org/index.php/Google#Google_Docs_Editors but i have no idea how much of that information is still accurate
-
pokechu22
No, I'm pretty sure you can use the original uc URL to get the link to the new URL... will dig up an example
-
JAA
-
JAA
There's a random UUID in the prompt confirmation URL.
-
pokechu22
Right, but the link to that random UUID will be saved in an archivebot job for that URL
-
pokechu22
so you can follow that link on web.archive.org just as you would on the original site
-
JAA
Right, if it stays alive long enough to wait for the WARC appearing on IA and extract the URL there, that'd work, I guess.
-
JAA
Oh, or an !a would work, right.
-
pokechu22
Yeah
-
JAA
I was thinking !ao the whole time. :-)
-
pokechu22
I *think* at one point there was a button that used a script to generate the confirmation link, in which case an !a wouldn't work, but it hasn't been that way for a few years
-
pokechu22
And the other thing about doing an !a is that it loads the sitemap and starts loading a bunch of other junk (there's also an infinite loop where it keeps adding share_facebook or something like that to the URL)
-
JAA
Yeah, I was going to say, probably needs some ignores to make it not go nuts.
-
pokechu22
-
pokechu22
The easiest approach is just to add ^ as an ignore once it's downloaded a large-looking file so that everything at that point gets ignored
-
JAA
Heh, yeah, that works. Unless the download fails, at least.
-
kitonthenet
For the blogger job, how many warrior jobs should/can/is useful to run?
-
JAA
→ #frogger
-
Naruyoko5
twitter.com/kogekidogso/status/1726622453424906649 Came across randomly, will stop operation on 12/3 and be deleted some time in the future. I don't know if this has anything worth saving or if its just ordinary personal account
-
eggdrop
-
Pedrosso
is it just this one twitter account?
-
Naruyoko5
In the post, they say the account's goal was to tell the situations of soccer rules and refereeing
-
Naruyoko5
There's also them answering various questions which you can find through the tag # querie_kogekidogso and separately from
querie.me/user/r1OYTzyfrTY0Fn4ZIBXI4nEJPs63/recent
-
Naruyoko5
They have a link to their note account (
note.com/goodcall_dogso), but it seems defunct and unarchived so I don't know what that could have been
-
Naruyoko5
I don't know this person, so I don't know the situation
-
Naruyoko5
-
Naruyoko5
Ah, they are also a writer of this newsletter
die-acht.theletter.jp
-
Naruyoko5
Another Q&A site they seem to be on
peing.net/ja/18kogekisoccer
-
fireonlive
rewby: some stuff in #frogger for ya (|backup isn’t there)
-
JAA
Naruyoko5: Are you sure it will stay online for some time after they 'stop operation'?
-
rktk
Is Miraheze dying?
-
rktk
Their static content subdomain seems to be... failing?
-
rktk
502 bad Gateway...
-
h2ibot
-
h2ibot
-
fireonlive
-
nicolas17
JAA: did you fix grab-site-docker yet?
-
JAA
nicolas17: No, got stuck last night and was getting tired. I might just do a stupid fix and include the `pip freeze` output from my working image as a requirements.txt for the time being.
-
JAA
But I'd like to fix this 'properly'.
-
that_lurker
Could someone grab
neil-gaiman.tumblr.com at somepoint
-
that_lurker
Mainly for the sake of archival, but he did also do the thing you don't currently do and commented on the israel sizuation, but that is most likely not going to cause anything but a small uproar in the comments
-
fireonlive
that'd be igset singletumblr yeah?
-
fireonlive
it was!