-
JAA
FOIAonline progress has picked up significantly since I added those extra processes. :-)
-
fireonlive
:)
-
fireonlive
-
joepie91|m
jesus christ
-
yert
I'm getting "Error response from daemon: Container ... is not running" (with a long alphanumeric string in place of ...)
-
imer
yert: any context for what you're doing and what you're expecting to see?
-
yert
@imer I last set the warrior (some time ago) to work on the Telegram project, but now when I switch it on and it gets just part the 'checking for updates' part, it starts printing that error over and over and the browser interface isn't accessible
-
yert
sometimes prints 'seesaw kit did not successfully boot up and update'
-
imer
oh, warrior stuff. not quite sure there. #warrior would be the channel for that. I see someone asked a similar question there.. so might not be on your end?
-
imer
Could try "reinstalling" it from scratch, see if it got corrupted somehow
-
yert
oh yeah that was me, but I didn't get a response there
-
imer
ah
-
yert
Full set of errors I'm getting:
-
yert
Please wait while the Warrior checks for updates...
-
yert
over again
-
flashfire42
Ok telegram is switched off
-
flashfire42
I dunno what the other issue is but we are hella clogged and telegram is paused
-
Darken
Hello, so I got some info on Discord's CDN which is going to prevent permenant CDN links -
i.imgur.com/lqgDQbB.png this hasn't been announced publicly yet but they have already rolled out the parameters, you will notice if you copy a new cdn link from inside the DIscord app
-
Darken
(
cdn.discordapp.com/attachments/1046…5969419288656/image.png?ex=65143c28...) currently you can still view the link without the parameters but this will soon change
-
pabs
-
nstrom|m
-
nstrom|m
job still running but looks close to done
-
HP_Archivist
nstrom|m: Ah, very good. Thanks!
-
fireonlive
interesting re: discord, guess they want to lower their cdn bills or something?
-
Peroniko
I suspect it's because of all the malware that is hosted on their CDN. Discord's links are legitimate, and are often not blocked by antiviruses and browsers
-
fireonlive
oh are they quite liberal with what they let you upload?
-
Peroniko
I think anything can be uploaded
-
Peroniko
I don't know the size limit though
-
fireonlive
ahh
-
fireonlive
would make sense then
-
imer
I'm not surprised in the slightest, was gonna happen sooner or later
-
imer
also upload size limit is 25 / 50 / 500 MB for free / paid basic / paid pro
-
audrooku|m
Is there any better priced service than webshare for a lot of shared DC IPV4 proxies with an ok amount of bandwidth? 500 @ 1TB with the high concurrency option costs about $35/mo, but I need more like 1000-200 IPs, ideally, though the "reputation" doesnt matter really at all, I'm making a dump of a service that uses 2^32 random ids and with 600 IPs my eta is about 6 months, which would be fine if these proxies I already have werent going to
-
audrooku|m
expire
-
audrooku|m
1000-2000*
-
arkiver
Telegram is going to be paused for some time.
-
arkiver
YouTube will also be paused for some time
-
fireonlive
thanks arkiver
-
arkiver
Reddit backlog will be moved away, starting with only new stuff.
-
arkiver
URLs backlog will also be moved away, we're going to focus on only news.
-
arkiver
Reddit will remain paused while in move items around
-
Peronikola
What is a good project to archive at the moment, because I an getting either rate limited or the items won't upload (reddit). I am doing urlteam2, because it's the only one that works
-
arkiver
Peronikola: i hope to have more on this soon - i'm cutting in what we archive at the moment and am moving data around
-
arkiver
JAA: what is the latest on the FOIA stuff?
-
arkiver
if we know the possible years and codes we can somewhat easily enumerate all FOIAs
-
arkiver
(FOIAs the right word? i don't know)
-
arkiver
ah actually you're getting it i read :)
-
JAA
arkiver: Yes, two thirds done now.
-
arkiver
did you also enumerate all those that may not have been returned by search results?
-
JAA
ETA: 33 hours
-
JAA
No, I only did the search.
-
JAA
You have to also know the type of the FOIA request, not only its ID.
-
arkiver
and - sorry just to confirm - you're also making that request that gives the actual info and PDFs attached to a FOIA right?
-
arkiver
JAA: yeah, but if we know the year and type, we can enumerate all lower than those returned by search results
-
arkiver
i started a derive task for your first FOIA item with WARCs
-
JAA
I'm fetching the page, the API request for request details, the API request for attachments + referenced files (if applicable via flag in request details response), the API requests for the records list, and the records files.
-
arkiver
alright!
-
JAA
Please don't derive yet.
-
arkiver
oh
-
arkiver
no?
-
arkiver
oops i can abort it
-
JAA
I'm still uploading.
-
arkiver
is that a problem?
-
JAA
part0 and part1 is probably fine to derive.
-
JAA
part2 would pile up archive.php tasks until uploads get rejected, too.
-
arkiver
i'm doing part0
-
JAA
I don't expect to have to change anything on parts 0 and 1.
-
pokechu22
arkiver: the orange stuff is stressing me a little bit since the AB job has gotten banned twice since the site came back up it seems (with it still running at a delay of 1000ms-2000ms, so for something distributed we probably want to do it even slower), and we're on borrowed time still since the site officially went down a while back; it's only up again because of a special
-
pokechu22
plea to customer support
-
JAA
But I usually doublecheck that the uploads were fine and only then issue the derive.
-
arkiver
JAA: would it be easy for you to also queue IDs into your setup for IDs that we might find through the sequential IDs check? or are you confident search results returned everything?
-
arkiver
pokechu22: yeah
-
arkiver
pokechu22: hold on
-
JAA
arkiver: It's a gov system. Can't be confident in anything at all. But yes, I can do that when this primary retrieval finishes.
-
JAA
Might need to add some more checks for things that don't exist, but not a big deal.
-
arkiver
alright
-
arkiver
let's see how playback is in the Wayback Machine :P
-
JAA
Record list retrieval is via POST...
-
arkiver
i dont remember - where these POST requests to a generic URLs, or was the URL still clearly tied to a FOIA submission?
-
JAA
The latter, but the body is necessary for pagination.
-
arkiver
right
-
fireonlive
"Anthony Rota resigns as Speaker after honouring Ukrainian veteran who fought with Nazi unit"
cbc.ca/news/politics/speaker-anthony-rota-resignation-1.6978422
-
fireonlive
anthonyrota.libparl.ca and
lop.parl.ca/sites/ParlInfo/default/en_CA/People/Profile?personId=9445 (i'll ask #archivebot) but might be other 'official canada' resources that should be archived
-
pabs
HP_Archivist: re the c64 link on #archiveteam, I did gamebase64 related URLs in #archivebot already. only remaining jobs are the open directories, need to process WARCs and find them
-
pabs
HP_Archivist: there were 4 jobs www/domain for gamebase64.com and gb64.com, also b22.com and some related sourceforge bits
-
HP_Archivist
pabs: Thanks for the update. Glad all the affiliated domains are being captured, too