-
flashfire42
the fuck
-
Pedrosso
Made a table of Steam Workshops (the table starting out collapsed for obvious reasons)
-
fireonlive
nice
-
Pedrosso
ye
-
nicolas17
>283 kilobytes
-
DJ
anon.cafe is shutting down on March 15
-
DJ
-
nicolas17
what is it?
-
DJ
It's an imageboard, part of a webring. Shutting down because of operating costs
anon.cafe/meta/res/16467.html#16486
-
DJ
Oh sorry that's not the board owner, it's just speculation they don't know.
-
pabs
pokechu22: a jira
jira.ecmwf.int
-
h2ibot
-
pokechu22
thanks
-
pokechu22
I'm going to try to get something started on those soon
-
pokechu22
I'm pretty sure the database doesn't actually need to be saved to get attachments, as the same URL extraction issue that causes a bunch of junk relative URLs for attachments means that all attachments get logged... so that simplifies things a bit
-
h2ibot
JustAnotherArchivist edited Current Projects (+1, Fix date):
wiki.archiveteam.org/?diff=51662&oldid=51658
-
h2ibot
FireonLive edited Current Projects (+16, move Blogger to long-term to reflect new…):
wiki.archiveteam.org/?diff=51663&oldid=51662
-
fireonlive
-
h2ibot
Pokechu22 edited Jira (+50, /* Not yet archived */…):
wiki.archiveteam.org/?diff=51664&oldid=51661
-
h2ibot
Pokechu22 edited Jira (+244, /* Strategy */ database isn't needed; link script):
wiki.archiveteam.org/?diff=51665&oldid=51664
-
h2ibot
-
h2ibot
-
h2ibot
Switchnode edited Deathwatch (+390, /* 2024 */ add world of tanks forums):
wiki.archiveteam.org/?diff=51668&oldid=51649
-
h2ibot
Entartet edited Deathwatch (+231, Added thebillionscompanion.net.):
wiki.archiveteam.org/?diff=51669&oldid=51668
-
h2ibot
Pokechu22 edited Games/Engines, Platforms and Hostings (+12, /* PC and Web */ [[Steam]]):
wiki.archiveteam.org/?diff=51670&oldid=50184
-
pokechu22
Hmm, `(echo a; echo b; echo c) | zstdgrep -e 'a' -e 'b'` gives no output for me but `zstdgrep -e 'a'` does as does `zgrep -e 'a' -e 'b'` or `grep -e 'a' -e 'b'` - this also happened when I used zstdgrep on a .gz file. Is this a bug or have I misunderstood something about zstdgrep?
-
JAA
This is a bug.
-
JAA
-
JAA
zstdless has similar issues with option parsing:
facebook/zstd #2880
-
pokechu22
Oof
-
JAA
Er, zstdless had*, although I haven't verified whether everything behaves correctly now.
-
pokechu22
I didn't even intend to type zstdgrep the first time, glad I noticed the missing output (I was verifying that extracting JIRA attachments from junk that gets logged in the meta-warc would work by comparing it with one where we extracted it from the DB)
-
JAA
Yeah, zstdgrep is fine for very simple cases, but if in doubt, it's better to use `zstdcat | grep ...` instead.
-
pokechu22
... ok, new problem, and this seems like it's not a grep one: from view-source:https://web.archive.org/web/20230929192111id_/https://bugs.mojang.com/browse/MC-180529 archivebot saw data-downloadurl="application/zip:Normal_Font_TT_v3.zip:https://bugs.mojang.com/secure/attachment/286387/Normal_Font_TT_v3.zip" and extracted
-
pokechu22
bugs.mojang.com/browse/application/…chment/286387/Normal_Font_TT_v3.zip but it *didn't* do anything with data-downloadurl="text/plain:hs_err_pid9900.log:https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log"
-
pokechu22
-
pokechu22
It doesn't seem to have extracted anything along the lines of browse/text.*\.log:
-
pokechu22
-
pokechu22
hmm, it also didn't extract any .nbt or .dat files - does archivebot have a list of extensions it'll assume might be files when doing extraction from data attributes?
-
JAA
This would be on wpull, not AB.
-
JAA
-
JAA
-
JAA
That should pass `is_likely_link`.
-
JAA
Oh hmm, unless it's the `mimetype.guess` check.
-
JAA
`mimetype.guess_type` *
-
JAA
Yeah, it fails the `is_likely_link` check.
-
pokechu22
alright, I guess we do need the database after all :|
-
JAA
Yep, `mimetypes.guess_type` doesn't know about `.log`.
-
JAA
It wouldn't be in the DB either.
-
JAA
`mimetypes.guess_type('text/plain:hs_err_pid9900.log:https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log', strict=False)` → `(None, None)`
-
pokechu22
-
JAA
Ah
-
pokechu22
this also means I need to find the database for hub.spigotmc.org which we ran a while back and saved the DB for, but I don't think I ever extracted outlinks from
-
pokechu22
I'll start !a < list jobs for several of the JIRA instances since we are running low on time, and then ping you for the DBs to be saved
-
thuban
fwiw on 3.11 `mimetypes.guess_type('text/plain:hs_err_pid9900.log:https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log', strict=False)` → `('text/plain', None)`
-
pokechu22
It probably still won't like .dat or .nbt though
-
thuban
indeed not
-
JAA
thuban: I'm still getting `(None, None)` on 3.11.
-
JAA
I think the `mimetypes` module does some discovery stuff in /usr/share or something like that.
-
JAA
So it can differ from system to system.
-
thuban
ah, so it does
-
JAA
-
JAA
Not /usr/share but same concept. :-)
-
thuban
you beat me to it, new github is awful v_v
-
JAA
It sure is, I do more and more stuff locally with a clone instead.
-
JAA
Especially since code search is loginwalled anyway.
-
thuban
anyway, perhaps the ab pipelines should be fitted with local mimetype files?
-
JAA
Perhaps wpull should ship its own list and init the `mimetypes` module with that.
-
thuban
ah! i didn't see that option. yes, that would simplify things
-
JAA
Apache's list doesn't even have .gz and .zst...
-
JAA
Looks like they're open to changes:
apache/httpd #372
-
h2ibot
Pokechu22 edited Jira (+163, the database is still needed):
wiki.archiveteam.org/?diff=51671&oldid=51666
-
h2ibot