#archiveteam-bs

00:00

thuban

friends, considering that google drive is also at risk from google's inactive account purge, is #googlecrash functional enough to be revived on a submitted-items basis?
00:01

thuban

(alternatively, can archivebot be made to work with google drive links? i thought not--but some remarks on the wiki suggest otherwise--but they're old)
00:04

JAA

AB can sort of grab individual files but not their metadata and I believe WBM playback sucks. Also, it doesn't work for some files, e.g. large ones that trigger a prompt due to Google not running a malware scan on them.
00:04

JAA

But yeah, getting #googlecrash back running would be great.
00:18

pokechu22

AB does work fine for larger files, I'm pretty sure
00:19

pokechu22

but on the other hand, I've seen AB get rate-limited once when downloading too many files (I don't remember the exact parameters that led to it though - I think it was related to the learnwithportals.com AB job?)
00:22

pokechu22

For larger files the uc URL does show a prompt, but it's a link to another URL with a query parameter and AB handles that properly (I remember save page now having issues with that in the past though). I think you previously said it was implemented differently in the past and didn't work then though
00:22

pokechu22

but I've definitely done files on the order of a few hundred megabytes which definitely give that scanning prompt and they successfully save (which can be seen in the size on the AB dashboard)
00:22

Vokun

I'll sort through my urls again. I know Google drive creates drive.google.com, docs.google.com, and sites.google.com links. Are there any others?
00:24

Vokun

jamboard.google.com, script.google.com
00:25

pokechu22

What I'm talking about is only for files on drive.google.com - docs.google.com is different and much more scripty, and I don't know how it can be handled; sites.google.com is static and is perfectly fine for AB. I don't know anything about the other two
00:25

JAA

pokechu22: Ah, but you need to know the verification token to find it again in the WBM, I guess.
00:25

thuban

docs is discussed at wiki.archiveteam.org/index.php/Google#Google_Docs_Editors but i have no idea how much of that information is still accurate
00:25

pokechu22

No, I'm pretty sure you can use the original uc URL to get the link to the new URL... will dig up an example
00:26

JAA

Here's a random one: drive.google.com/uc?id=0B8zhxvoYK3WTVHNpTjNMUzRtTWM&export=download
00:26

JAA

There's a random UUID in the prompt confirmation URL.
00:27

pokechu22

Right, but the link to that random UUID will be saved in an archivebot job for that URL
00:27

pokechu22

so you can follow that link on web.archive.org just as you would on the original site
00:28

JAA

Right, if it stays alive long enough to wait for the WARC appearing on IA and extract the URL there, that'd work, I guess.
00:28

JAA

Oh, or an !a would work, right.
00:28

pokechu22

Yeah
00:28

JAA

I was thinking !ao the whole time. :-)
00:29

pokechu22

I *think* at one point there was a button that used a script to generate the confirmation link, in which case an !a wouldn't work, but it hasn't been that way for a few years
00:31

pokechu22

And the other thing about doing an !a is that it loads the sitemap and starts loading a bunch of other junk (there's also an infinite loop where it keeps adding share_facebook or something like that to the URL)
00:32

JAA

Yeah, I was going to say, probably needs some ignores to make it not go nuts.
00:32

pokechu22

ok, here's an example: web.archive.org/web/20230617223758/…rrpbNHRDx2M8X-zOF-y&export=download
00:32

pokechu22

The easiest approach is just to add ^ as an ignore once it's downloaded a large-looking file so that everything at that point gets ignored
00:36

JAA

Heh, yeah, that works. Unless the download fails, at least.
00:40

kitonthenet

For the blogger job, how many warrior jobs should/can/is useful to run?
00:40

JAA

→ #frogger
00:44

Naruyoko5

twitter.com/kogekidogso/status/1726622453424906649 Came across randomly, will stop operation on 12/3 and be deleted some time in the future. I don't know if this has anything worth saving or if its just ordinary personal account
00:44

eggdrop

nitter: nitter.net/kogekidogso/status/1726622453424906649
00:48

Pedrosso

is it just this one twitter account?
00:59

Naruyoko5

In the post, they say the account's goal was to tell the situations of soccer rules and refereeing
01:00

Naruyoko5

There's also them answering various questions which you can find through the tag # querie_kogekidogso and separately from querie.me/user/r1OYTzyfrTY0Fn4ZIBXI4nEJPs63/recent
01:01

Naruyoko5

They have a link to their note account (note.com/goodcall_dogso), but it seems defunct and unarchived so I don't know what that could have been
01:03

Naruyoko5

I don't know this person, so I don't know the situation
01:06

Naruyoko5

Same name youtube.com/@user-ym7rm6vy4l/featured
01:07

Naruyoko5

Ah, they are also a writer of this newsletter die-acht.theletter.jp
01:11

Naruyoko5

Another Q&A site they seem to be on peing.net/ja/18kogekisoccer
01:28

fireonlive

rewby: some stuff in #frogger for ya (|backup isn’t there)
03:01

JAA

Naruyoko5: Are you sure it will stay online for some time after they 'stop operation'?
04:10

rktk

Is Miraheze dying?
04:10

rktk

Their static content subdomain seems to be... failing?
04:14

rktk

502 bad Gateway...
05:00

h2ibot

JAABot edited CurrentWarriorProject (+2): wiki.archiveteam.org/?diff=51177&oldid=51172
15:00

h2ibot

JAABot edited CurrentWarriorProject (-2): wiki.archiveteam.org/?diff=51178&oldid=51177
18:30

fireonlive

+rss- Binance Founder Changpeng Zhao Agrees to Step Down, Plead Guilty: wsj.com/finance/currencies/binance-…hao-step-down-plead-guilty-01f72a40 news.ycombinator.com/item?id=38366729
20:05

nicolas17

JAA: did you fix grab-site-docker yet?
20:11

JAA

nicolas17: No, got stuck last night and was getting tired. I might just do a stupid fix and include the `pip freeze` output from my working image as a requirements.txt for the time being.
20:11

JAA

But I'd like to fix this 'properly'.
21:12

that_lurker

Could someone grab neil-gaiman.tumblr.com at somepoint
21:18

that_lurker

Mainly for the sake of archival, but he did also do the thing you don't currently do and commented on the israel sizuation, but that is most likely not going to cause anything but a small uproar in the comments
22:58

fireonlive

that'd be igset singletumblr yeah?
22:59

fireonlive

it was!

10 months ago

« a day earlier

a day later »

today »