04:57:42 <hook54321> really not a fan DRM, but i can see why they sent the takedown request with the surrounding legal stuff going on right now
05:01:03 <flashfire42> I mean there are easier ways to get pirated books than crack DRM from IA
05:08:38 <fireonlive> 👀
05:08:53 <fireonlive> hook54321: yeah for sure. hope it helps them
05:33:11 <pcr> It does probably mean they are going to resist any attempt to perform a (maybe illegal) archival if they announce they are going to need to delete stuff on a day in the future.
06:03:41 <thuban> flashfire42: depends on the book tbqh
07:18:20 * pabs hmmm at https://godotforums.org/d/35412-sadly-i-think-godot-is-a-scam-im-not-sure-i-can-do-this
07:27:53 <h2ibot> Yts98 edited 半次元 (+2524, Explain alternate image CDN endpoints): https://wiki.archiveteam.org/?diff=50208&oldid=50196
07:32:54 <h2ibot> Yts98 edited 半次元 (+0): https://wiki.archiveteam.org/?diff=50209&oldid=50208
10:18:24 <h2ibot> PaulWise edited Bugzilla (+30, ghostscript bugzilla): https://wiki.archiveteam.org/?diff=50210&oldid=50178
10:18:25 <h2ibot> PaulWise edited Bugzilla (+53, IRC channel topics idea): https://wiki.archiveteam.org/?diff=50211&oldid=50210
10:25:26 <h2ibot> PaulWise edited Bugzilla (+160, security issue lists idea): https://wiki.archiveteam.org/?diff=50212&oldid=50211
10:26:26 <h2ibot> PaulWise edited Bugzilla (-2, syntax fix): https://wiki.archiveteam.org/?diff=50213&oldid=50212
10:33:27 <h2ibot> PaulWise edited Mailman2 (+213, add IRC and sectrackers as sources of mailman2…): https://wiki.archiveteam.org/?diff=50214&oldid=50180
12:02:44 <h2ibot> PaulWise edited Bugzilla (+792, add URLs from Debian sectracker): https://wiki.archiveteam.org/?diff=50215&oldid=50213
16:00:27 <h2ibot> JAABot edited CurrentWarriorProject (-2): https://wiki.archiveteam.org/?diff=50216&oldid=50153
18:49:13 <pokechu22> JAA: I have a large set of URLs related to germandocsinrussia.org and historyrussia.org book scans, probably on the order of 10 million across all sites and all unsaved zoom levels. They're incremental IDs with gaps (e.g. https://wwii.germandocsinrussia.org/pages/24/zooms/8, https://wwii.germandocsinrussia.org/pages/1505900/zooms/8 - I haven't figured out the exact maximum
18:49:16 <pokechu22> yet) where zoom ranges from 3 to 7 or 8 (0-2 are not used directly, but instead e.g. https://wwii.germandocsinrussia.org/system/pages/000/734/55/images/small/fd4fabbe9f63bf507db8ac35af4e318616146ad4.jpg?1538539960 or x_small or xx_small, and archivebot will have already captured them so we don't need to worry about the random-looking component). I assume qwarc is the best
18:49:18 <pokechu22> tool for that, as giving archivebot an !ao < list job with 10 million entries will result in sadness?
18:49:26 <pokechu22> If so, what kind of information do you need to do a qwarc job?
19:21:40 <pokechu22> For the AB jobs that have finished, I've determined that the highest valid IDs are https://tsamo.germandocsinrussia.org/pages/48045/zooms/8 and https://rgaspi-458-9.germandocsinrussia.org/pages/77762/zooms/8 (and that there are 40593 and 70703 actual valid images in that region respectively, with invalid ones in that range giving 500s and ones outside that range giving 404s).
19:21:43 <pokechu22> It seems like zoom 8 gives errors on some URLs (e.g. https://rgaspi-458-9.germandocsinrussia.org/pages/8/zooms/8) for which zoom 7 does work.
19:23:24 <pokechu22> ah, scratch that about 500s, seems to depend on the site as https://wwii.germandocsinrussia.org/pages/163000/zooms/8 and https://wwii.germandocsinrussia.org/pages/165000/zooms/8 are 200 but https://wwii.germandocsinrussia.org/pages/164000/zooms/8 is 404 instead of 500. I'll just wait for AB to finish to get a maximum valid ID instead of trying to do a binary search
19:45:54 <JAA> pokechu22: Yeah, loading 10M into AB would be slow. The list input importing in wpull is a bit awkward. It'd probably take a few hours. That's the only sad part though. It's certainly better otherwise because it allows for easy monitoring, request rate adjustment, etc., which isn't really the case with qwarc.
19:46:38 <JAA> And could do it in smaller chunks of course rather than one huge list.
19:47:18 <JAA> It's possible of course with qwarc, just doesn't sound like a great fit unless the site is going down soon and can handle several dozen requests per second.
19:47:28 <pokechu22> Alright, I might try it for the smaller ones at least
19:48:10 <pokechu22> I'm not aware of any rate-limiting - I'll try tsamo.germandocsinrussia.org at an aggressive rate with AB to see what happens maybe
19:56:32 <JAA> Well, qwarc is about 1 or 2 orders of magnitude faster than AB...
19:57:02 <JAA> (Without trying hard, that is.)
19:57:33 <JAA> Although AB is able to reach something like 20 req/s quite comfortably for images.
19:58:17 <pokechu22> Probably we'd be limited by the ping time to russia if anything
19:58:38 <JAA> Right
20:03:29 <nicolas17> JAA: https://archive.org/details/csdnsdplist this has a bunch of "screensavers" used on Apple Store demo devices, but it also has the original URLs they were downloaded from, would it be worth putting them in archivebot or something so they're on WBM?
20:09:16 <JAA> (TIL 'H.264 IA' for derived videos.)
20:09:20 <JAA> nicolas17: Maybe, yeah. I wouldn't be opposed to it.
20:14:21 <fireonlive> i wonder why the first one is 'sideways'
20:14:34 <fireonlive> hm they all seem sideways
20:16:01 <fireonlive> the few i checked yday were good though :D
20:16:39 <nicolas17>     Side data:
20:16:41 <nicolas17>       displaymatrix: rotation of -90.00 degrees
20:16:51 <fireonlive> ahh
20:16:59 <nicolas17> which the web player doesn't understand ig
20:17:07 <fireonlive> makes sense :)
20:17:26 <nicolas17> also it seems many of these are h265 and HDR
20:21:15 <pokechu22> ugh, looks like there's also a map view for some pages that's higher resolution, e.g. https://tsamo.germandocsinrussia.org/pages/44716/map - indicated on view-source:https://tsamo.germandocsinrussia.org/ru/nodes/246-delo-234-karta-polozheniya-frantsuzskih-angliyskih-i-belgiyskih-voysk-na-zapadnom-fronte-na-04-05-1918g-m-1-750-000 by map_ids = [44716]; in the JS. Pretty sure
20:21:18 <pokechu22> the only way to find those is to download the full warcs :|
20:22:39 <pokechu22> (you can plug in any page ID, but most will try to load missing images, and I think there's only a few maps to trying to save them for everything would be a waste of resources)
22:25:17 <nicolas17> JAA: transfer.archivete.am is down
22:26:21 <nicolas17> Caddy returns Bad Gateway
22:26:23 <JAA> nicolas17: Yes, we have monitoring for that in #nodeping.
22:26:30 <nicolas17> ok