04:57:42 really not a fan DRM, but i can see why they sent the takedown request with the surrounding legal stuff going on right now 05:01:03 I mean there are easier ways to get pirated books than crack DRM from IA 05:08:38 👀 05:08:53 hook54321: yeah for sure. hope it helps them 05:33:11 It does probably mean they are going to resist any attempt to perform a (maybe illegal) archival if they announce they are going to need to delete stuff on a day in the future. 06:03:41 flashfire42: depends on the book tbqh 07:18:20 * pabs hmmm at https://godotforums.org/d/35412-sadly-i-think-godot-is-a-scam-im-not-sure-i-can-do-this 07:27:53 Yts98 edited 半次元 (+2524, Explain alternate image CDN endpoints): https://wiki.archiveteam.org/?diff=50208&oldid=50196 07:32:54 Yts98 edited 半次元 (+0): https://wiki.archiveteam.org/?diff=50209&oldid=50208 10:18:24 PaulWise edited Bugzilla (+30, ghostscript bugzilla): https://wiki.archiveteam.org/?diff=50210&oldid=50178 10:18:25 PaulWise edited Bugzilla (+53, IRC channel topics idea): https://wiki.archiveteam.org/?diff=50211&oldid=50210 10:25:26 PaulWise edited Bugzilla (+160, security issue lists idea): https://wiki.archiveteam.org/?diff=50212&oldid=50211 10:26:26 PaulWise edited Bugzilla (-2, syntax fix): https://wiki.archiveteam.org/?diff=50213&oldid=50212 10:33:27 PaulWise edited Mailman2 (+213, add IRC and sectrackers as sources of mailman2…): https://wiki.archiveteam.org/?diff=50214&oldid=50180 12:02:44 PaulWise edited Bugzilla (+792, add URLs from Debian sectracker): https://wiki.archiveteam.org/?diff=50215&oldid=50213 16:00:27 JAABot edited CurrentWarriorProject (-2): https://wiki.archiveteam.org/?diff=50216&oldid=50153 18:49:13 JAA: I have a large set of URLs related to germandocsinrussia.org and historyrussia.org book scans, probably on the order of 10 million across all sites and all unsaved zoom levels. They're incremental IDs with gaps (e.g. https://wwii.germandocsinrussia.org/pages/24/zooms/8, https://wwii.germandocsinrussia.org/pages/1505900/zooms/8 - I haven't figured out the exact maximum 18:49:16 yet) where zoom ranges from 3 to 7 or 8 (0-2 are not used directly, but instead e.g. https://wwii.germandocsinrussia.org/system/pages/000/734/55/images/small/fd4fabbe9f63bf507db8ac35af4e318616146ad4.jpg?1538539960 or x_small or xx_small, and archivebot will have already captured them so we don't need to worry about the random-looking component). I assume qwarc is the best 18:49:18 tool for that, as giving archivebot an !ao < list job with 10 million entries will result in sadness? 18:49:26 If so, what kind of information do you need to do a qwarc job? 19:21:40 For the AB jobs that have finished, I've determined that the highest valid IDs are https://tsamo.germandocsinrussia.org/pages/48045/zooms/8 and https://rgaspi-458-9.germandocsinrussia.org/pages/77762/zooms/8 (and that there are 40593 and 70703 actual valid images in that region respectively, with invalid ones in that range giving 500s and ones outside that range giving 404s). 19:21:43 It seems like zoom 8 gives errors on some URLs (e.g. https://rgaspi-458-9.germandocsinrussia.org/pages/8/zooms/8) for which zoom 7 does work. 19:23:24 ah, scratch that about 500s, seems to depend on the site as https://wwii.germandocsinrussia.org/pages/163000/zooms/8 and https://wwii.germandocsinrussia.org/pages/165000/zooms/8 are 200 but https://wwii.germandocsinrussia.org/pages/164000/zooms/8 is 404 instead of 500. I'll just wait for AB to finish to get a maximum valid ID instead of trying to do a binary search 19:45:54 pokechu22: Yeah, loading 10M into AB would be slow. The list input importing in wpull is a bit awkward. It'd probably take a few hours. That's the only sad part though. It's certainly better otherwise because it allows for easy monitoring, request rate adjustment, etc., which isn't really the case with qwarc. 19:46:38 And could do it in smaller chunks of course rather than one huge list. 19:47:18 It's possible of course with qwarc, just doesn't sound like a great fit unless the site is going down soon and can handle several dozen requests per second. 19:47:28 Alright, I might try it for the smaller ones at least 19:48:10 I'm not aware of any rate-limiting - I'll try tsamo.germandocsinrussia.org at an aggressive rate with AB to see what happens maybe 19:56:32 Well, qwarc is about 1 or 2 orders of magnitude faster than AB... 19:57:02 (Without trying hard, that is.) 19:57:33 Although AB is able to reach something like 20 req/s quite comfortably for images. 19:58:17 Probably we'd be limited by the ping time to russia if anything 19:58:38 Right 20:03:29 JAA: https://archive.org/details/csdnsdplist this has a bunch of "screensavers" used on Apple Store demo devices, but it also has the original URLs they were downloaded from, would it be worth putting them in archivebot or something so they're on WBM? 20:09:16 (TIL 'H.264 IA' for derived videos.) 20:09:20 nicolas17: Maybe, yeah. I wouldn't be opposed to it. 20:14:21 i wonder why the first one is 'sideways' 20:14:34 hm they all seem sideways 20:16:01 the few i checked yday were good though :D 20:16:39 Side data: 20:16:41 displaymatrix: rotation of -90.00 degrees 20:16:51 ahh 20:16:59 which the web player doesn't understand ig 20:17:07 makes sense :) 20:17:26 also it seems many of these are h265 and HDR 20:21:15 ugh, looks like there's also a map view for some pages that's higher resolution, e.g. https://tsamo.germandocsinrussia.org/pages/44716/map - indicated on view-source:https://tsamo.germandocsinrussia.org/ru/nodes/246-delo-234-karta-polozheniya-frantsuzskih-angliyskih-i-belgiyskih-voysk-na-zapadnom-fronte-na-04-05-1918g-m-1-750-000 by map_ids = [44716]; in the JS. Pretty sure 20:21:18 the only way to find those is to download the full warcs :| 20:22:39 (you can plug in any page ID, but most will try to load missing images, and I think there's only a few maps to trying to save them for everything would be a waste of resources) 22:25:17 JAA: transfer.archivete.am is down 22:26:21 Caddy returns Bad Gateway 22:26:23 nicolas17: Yes, we have monitoring for that in #nodeping. 22:26:30 ok