01:17:35 so I went looking at my 135G zowa warc on IA. Found it at https://archive.org/download/archiveteam_zowa_20230923012400_df2de1d0 but also at https://archive.org/download/archiveteam_zowa_20230924040422_7fbffef8. Why would there be two copies, uploaded on different days with different filenames/timestamps? 01:18:10 Probably the item was reclaimed and completed twice (or more times). 01:18:59 oh, interesting. I assume IA won't dedupe/reap these and they will show on the WBM as captures on different days? 01:19:10 Yes 01:20:14 ok, good to know the total size displayed on the tracker is not necessarily indicative of the amount shipped to IA 03:06:04 so this debian developer died https://abrahamraji.in/ 03:06:39 i'm going to crawl that site and https://wiki.abrahamraji.in/ 03:06:50 there's also https://www.youtube.com/@abrahamraji3699/ i'm not sure what to do with 03:07:34 there's also https://gitlab.com/avron https://aana.site/@avronr - same 03:08:46 oh looks like pabs already did it 03:09:04 anarcat: yeah, well covered 03:09:21 anarcat: did the youtube in #down-the-tube 03:09:38 thanks 03:09:41 so sad 03:10:02 the mastodon I don't think can be saved, too much JS and AT doesn't save fediverse I thought 03:19:13 ack 03:20:03 if we wanted to, this could be repurposed for that https://github.com/jwilk/zygolophodon 08:10:10 I dunno what happened but I am seeing a lot more movement across the warrior projects 09:35:02 does there anybody has an access to genspect's chatroom? 09:35:06 https://www.dailydot.com/debug/genspect/ 09:35:28 they claim to run a semi-secret forum where they discuss anti-trans extermist talking points 10:29:40 how is decided how often a website will be crawled/snapshotted? e.g. http://zwisler.de/ 13:42:41 thunder_steak: in what context? for ArchiveBot, usually when the site is closing or there is another reason for doing it 14:17:24 pabs e.g. http://zwisler.de/ has been snapshotted multiple times but with no constant frequency 14:24:25 I guess you mean in web.archive.org. if you click the "About this capture" thing on the top right, you can get some idea 14:25:00 as you can see here, zero of those were ArchiveTeam ArchiveBot snapshots: https://archive.fart.website/archivebot/viewer/?q=zwisler.de 14:39:45 My Canucks forums topic page qwarc grab finished earlier today without any obvious issues. 14:43:20 196068 We could not find that topic. 14:43:21 21026 You do not have permission to view this topic. 14:43:21 122653 There are no posts to show 14:43:40 The rest of the 409104 topic IDs were retrieved. 15:31:52 I got approximately 6007327 posts, which matches the homepage. :-) 15:34:28 I might try to grab new posts as they're being made until the shutdown if I have time to set that up. 15:35:01 Although the post URLs require a topic ID, it doesn't have to be correct; you can do something like https://forum.canucks.com/topic/0-x/?do=findComment&comment=16942183 instead. 18:32:07 FOIAonline completion rate has slowed down due to larger items, now at about a third done and an estimated 3 TiB total. ETA is still on time but only just (a bit over 4 days). 18:32:21 (That's based on the rate of the past 6 hours.) 18:34:32 Actually, probably closer to 4 TiB. 18:35:27 hm, rough--chronological ordering suggests sizes will continue to increase 18:41:59 Yeah 18:44:03 I can try throwing more concurrency at it. My machine is nowhere near its limits. 18:44:21 And I haven't seen any rate limiting or blocks whatsoever so far, just some random timeouts. 18:46:54 seems wise, especially if you can adjust on the fly. what tooling are you using? 18:49:59 qwarc 18:50:24 I can't adjust the concurrency of running processes, but I can add more processes. :-) 18:50:48 >:? 18:50:52 (I'd have to stop them, ideally gracefully, for the former.) 18:51:46 I originally had one process at 25 concurrency, but that was far from ideal because it got blocked sometimes by large downloads. 18:51:52 So now it's 5 processes with 5 concurrency each. 18:56:53 ah, i forgot qwarc runs off a database and everything. it's sufficiently self-organizing that you can just tell new processes to jump in, then? 18:57:57 Yep, each process just takes items from the DB, processes them, and writes the new status back (plus any new items it might've discovered, not relevant in this case). 18:58:24 neat 19:01:04 It really is pretty much like a local tracker in that respect. That's what I modelled it after conceptually, anyway. 19:02:36 Also, some of the timeouts I'm seeing are actually due to large downloads taking time to process, similar to the problems in wpull. 19:03:14 Eventuallyâ„¢, I'll refactor that so the actual HTTP stuff happens in a separate thread. 19:36:50 I got a response from Jason Scott about yahoo video with "all I can say is all the data is up there, one way or another. There'sno other stores out there." 19:37:20 I'm not really sure if that means it could have been mixed up with something else or if it wasn't uploaded then it's gone forever 19:37:54 Rootliam: did you ever open that github issue? 19:38:06 No but I guess I should do that soon 20:52:05 Wait we are completely clogged? Like completely? 23:29:38 So is it Optane9 again rewby or is it the transferring stuck? 23:32:19 Ok looks like its optane9 that needs a kick if you have access to it JAA I did a test and Mediafire uses a seperate target and one of them went through fine 23:32:29 flashfire42: Please stop. 23:33:05 Targets are doing target things as well as they can. The situation isn't great, and everyone's aware of it.