00:08:43 FOIAonline progress has picked up significantly since I added those extra processes. :-) 00:38:09 :) 01:45:32 trouble at Kick? https://old.reddit.com/r/OutOfTheLoop/comments/16rj0g8/whats_going_on_with_the_streaming_platform_kick/ 01:52:12 jesus christ 09:25:10 I'm getting "Error response from daemon: Container ... is not running" (with a long alphanumeric string in place of ...) 09:48:50 yert: any context for what you're doing and what you're expecting to see? 09:56:00 @imer I last set the warrior (some time ago) to work on the Telegram project, but now when I switch it on and it gets just part the 'checking for updates' part, it starts printing that error over and over and the browser interface isn't accessible 09:56:47 sometimes prints 'seesaw kit did not successfully boot up and update' 09:58:56 oh, warrior stuff. not quite sure there. #warrior would be the channel for that. I see someone asked a similar question there.. so might not be on your end? 09:59:20 Could try "reinstalling" it from scratch, see if it got corrupted somehow 09:59:23 oh yeah that was me, but I didn't get a response there 09:59:29 ah 10:00:32 Full set of errors I'm getting: 10:00:33 Please wait while the Warrior checks for updates... 10:00:36 over again 10:01:49 Ok telegram is switched off 10:02:12 I dunno what the other issue is but we are hella clogged and telegram is paused 14:35:05 Hello, so I got some info on Discord's CDN which is going to prevent permenant CDN links - https://i.imgur.com/lqgDQbB.png this hasn't been announced publicly yet but they have already rolled out the parameters, you will notice if you copy a new cdn link from inside the DIscord app 14:35:06 (https://cdn.discordapp.com/attachments/1046131690621378560/1156235969419288656/image.png?ex=65143c28...) currently you can still view the link without the parameters but this will soon change 14:59:32 https://techcrunch.com/2023/09/26/google-podcasts-to-shut-down-in-2024-with-listeners-migrated-to-youtube-music/ 15:17:59 HP_Archivist: re #archiveteam looks like yes yesterday https://archive.fart.website/archivebot/viewer/job/2023092513511565kk7 15:19:50 job still running but looks close to done 16:09:18 nstrom|m: Ah, very good. Thanks! 16:12:52 interesting re: discord, guess they want to lower their cdn bills or something? 16:15:07 I suspect it's because of all the malware that is hosted on their CDN. Discord's links are legitimate, and are often not blocked by antiviruses and browsers 16:18:06 oh are they quite liberal with what they let you upload? 16:18:41 I think anything can be uploaded 16:18:54 I don't know the size limit though 16:20:41 ahh 16:20:56 would make sense then 16:48:54 I'm not surprised in the slightest, was gonna happen sooner or later 16:49:45 also upload size limit is 25 / 50 / 500 MB for free / paid basic / paid pro 16:56:31 Is there any better priced service than webshare for a lot of shared DC IPV4 proxies with an ok amount of bandwidth? 500 @ 1TB with the high concurrency option costs about $35/mo, but I need more like 1000-200 IPs, ideally, though the "reputation" doesnt matter really at all, I'm making a dump of a service that uses 2^32 random ids and with 600 IPs my eta is about 6 months, which would be fine if these proxies I already have werent going to 16:56:31 expire 17:05:06 1000-2000* 20:25:12 Telegram is going to be paused for some time. 20:28:11 YouTube will also be paused for some time 20:38:14 thanks arkiver 20:42:45 Reddit backlog will be moved away, starting with only new stuff. 20:42:58 URLs backlog will also be moved away, we're going to focus on only news. 20:43:40 Reddit will remain paused while in move items around 20:45:16 What is a good project to archive at the moment, because I an getting either rate limited or the items won't upload (reddit). I am doing urlteam2, because it's the only one that works 20:46:31 Peronikola: i hope to have more on this soon - i'm cutting in what we archive at the moment and am moving data around 21:04:17 JAA: what is the latest on the FOIA stuff? 21:04:54 if we know the possible years and codes we can somewhat easily enumerate all FOIAs 21:05:03 (FOIAs the right word? i don't know) 21:05:49 ah actually you're getting it i read :) 21:07:27 arkiver: Yes, two thirds done now. 21:07:39 did you also enumerate all those that may not have been returned by search results? 21:07:46 ETA: 33 hours 21:07:57 No, I only did the search. 21:08:13 You have to also know the type of the FOIA request, not only its ID. 21:08:15 and - sorry just to confirm - you're also making that request that gives the actual info and PDFs attached to a FOIA right? 21:08:33 JAA: yeah, but if we know the year and type, we can enumerate all lower than those returned by search results 21:08:49 i started a derive task for your first FOIA item with WARCs 21:09:14 I'm fetching the page, the API request for request details, the API request for attachments + referenced files (if applicable via flag in request details response), the API requests for the records list, and the records files. 21:09:28 alright! 21:09:41 Please don't derive yet. 21:09:45 oh 21:09:46 no? 21:09:50 oops i can abort it 21:10:00 I'm still uploading. 21:10:11 is that a problem? 21:10:42 part0 and part1 is probably fine to derive. 21:10:51 part2 would pile up archive.php tasks until uploads get rejected, too. 21:10:53 i'm doing part0 21:11:17 I don't expect to have to change anything on parts 0 and 1. 21:11:23 arkiver: the orange stuff is stressing me a little bit since the AB job has gotten banned twice since the site came back up it seems (with it still running at a delay of 1000ms-2000ms, so for something distributed we probably want to do it even slower), and we're on borrowed time still since the site officially went down a while back; it's only up again because of a special 21:11:26 plea to customer support 21:11:37 But I usually doublecheck that the uploads were fine and only then issue the derive. 21:12:29 JAA: would it be easy for you to also queue IDs into your setup for IDs that we might find through the sequential IDs check? or are you confident search results returned everything? 21:12:31 pokechu22: yeah 21:12:43 pokechu22: hold on 21:13:19 arkiver: It's a gov system. Can't be confident in anything at all. But yes, I can do that when this primary retrieval finishes. 21:13:52 Might need to add some more checks for things that don't exist, but not a big deal. 21:14:01 alright 21:14:14 let's see how playback is in the Wayback Machine :P 21:14:27 Record list retrieval is via POST... 21:14:38 i dont remember - where these POST requests to a generic URLs, or was the URL still clearly tied to a FOIA submission? 21:15:00 The latter, but the body is necessary for pagination. 21:15:20 right 21:46:00 "Anthony Rota resigns as Speaker after honouring Ukrainian veteran who fought with Nazi unit" https://www.cbc.ca/news/politics/speaker-anthony-rota-resignation-1.6978422 21:46:36 https://anthonyrota.libparl.ca and https://lop.parl.ca/sites/ParlInfo/default/en_CA/People/Profile?personId=9445 (i'll ask #archivebot) but might be other 'official canada' resources that should be archived 21:58:06 HP_Archivist: re the c64 link on #archiveteam, I did gamebase64 related URLs in #archivebot already. only remaining jobs are the open directories, need to process WARCs and find them 22:00:53 HP_Archivist: there were 4 jobs www/domain for gamebase64.com and gb64.com, also b22.com and some related sourceforge bits 23:19:32 pabs: Thanks for the update. Glad all the affiliated domains are being captured, too