-
JAA
So turns out that some files aren't downloadable from FOIAonline:
-
JAA
> Message The request was rejected because the URL contained a potentially malicious String "%3B"
-
JAA
The filename contains a semicolon...
-
JAA
No, replacing %3B with a literal semicolon doesn't work either, nor does double-encoding it.
-
JAA
This appears to be the problem for many of the items that failed.
-
project10
rogue WAF strikes again
-
fireonlive
………….
-
fireonlive
fucking christ
-
appledash
lmao
-
JAA
Oh, it's even stupider than I thought.
-
JAA
So those document download links contain the filename, presumably for the Content-Disposition header.
-
JAA
But you can just change it (as long as you still send the right Referer etc.).
-
JAA
Guess I'll add detection for this org.springframework.security.web.firewall.StrictHttpFirewall.rejectedBlacklistedUrls thing and just replace the filename with something generic for those.
-
JAA
The bruteforcing still didn't happen, by the way. Machine was busy with the known existing requests and uploading.
-
JAA
Since I need to retry these items yet again, not sure if it will happen I'm afraid.
-
JAA
I did briefly sample some of the 'missing' IDs, and those all seemed to fail (i.e. 403 on the API), so hopefully there isn't much missing.
-
JAA
I'll also skip over files that can't be downloaded. Until now, it would fail the entire item (= request).
-
JAA
I also bumped the timeout since some API requests and some large file downloads ran into that.
-
JAA
This is probably as good as it'll get.
-
JAA
Oops, I played with a bruteforce sample and got myself banned it seems. Let's hope it doesn't last long.
-
JAA
Actually, can't connect from elsewhere either. Uh oh...
-
JAA
Ok, it's back.
-
JAA
Needless to say that bruteforcing won't work if that's what happens.
-
JAA
But also, not a single hit on that sample.
-
pabs
-
Peroniko
While I love Letterboxd, it is ultimately a reskin of TMDB data with a few social media features. I would love it if they implemented better guidelines about what is a review and what is a comment. There is too many one liners on any decently popular film
-
Peroniko
Still better than Goodreads though. Amazon ruined that site
-
FavoritoHJS
fyi it appears discord is changing how cdn links work, this is likely to break the other half of the uploads that didn't get lit ablaze by dropbox dropping the box or imgur failing to image...
-
FavoritoHJS
considering how much important knowledge is in non-crawled, almost certainly non-backed-up guilds there, i wonder if a proper project would be worthwhile...
-
Sanqui
#discard
-
JAA
Turns out that there are entries on FOIAonline which can't be found by the search (at least with how I used it), but they aren't in my bruteforce list either. Two examples:
foiaonline.gov/foiaonline/action/pu…er=DOI-FWS-2023-003849&type=Request foiaonline.gov/foiaonline/action/pu…Number=DOJ-2020-000763&type=Request
-
JAA
Probably not much that can be done about that. :-/
-
JAA
They don't even show up when you specifically search for those tracking numbers.
-
FavoritoHJS
about the discord cdn shenanigans... it appears it means all cdn links to discord will break outside of discord...
-
FavoritoHJS
and i have seen many MANY cases of a discord cdn link being used for a download that ought to be persistent...
-
JAA
And this is still not the channel to discuss it.
-
imer
-> #discard
-
h2ibot
JustAnotherArchivist edited FOIAonline (+2477, Document site quirks):
wiki.archiveteam.org/?diff=50912&oldid=50898
-
JAA
Something broke at FOIAonline about 15 minutes ago. Getting a lot more errors now.
-
JAA
FOIAonline is offline now, happened sometime in the past hour or so.
-
JAA
I was hoping it'd last a bit longer since they said that today would be the last day of access and it'd be inaccessible tomorrow, but oh well.
-
JAA
I got the vast majority of discoverable content, I think.
-
fireonlive
🪦 rip
-
fireonlive
thanks JAA
-
h2ibot
JustAnotherArchivist edited FOIAonline (-44, It's dead, Jim.):
wiki.archiveteam.org/?diff=50913&oldid=50912
-
thuban
good work, JAA!
-
fireonlive
for sure