-
fireonlive
-_-
-
JAA
From another 1% sample, I got a size estimate of 2.6 TiB. Shows how uncertain that is due to the long tail of requests with many/large records.
-
JAA
s/another/a/ I guess, the previous one was 1‰, not 1%.
-
h2ibot
PaulWise edited IRC/Logs (+181, add more, done copyleft.org ones):
wiki.archiveteam.org/?diff=50885&oldid=50312
-
h2ibot
PaulWise edited Mailman2 (+246, add more, done copyleft.org ones):
wiki.archiveteam.org/?diff=50886&oldid=50882
-
h2ibot
PaulWise edited IRC/Logs (+48, AT IRC logs done :)):
wiki.archiveteam.org/?diff=50887&oldid=50885
-
JAA
My FOIAonline search bruteforcing got some duplicates. Wat.
-
JAA
I listed the requests by receival date for each day from 2000-01-01, then going through the pagination of the results.
-
JAA
These dupes are all over the place, oldest one is in 2016. Something is pretty broken there. lol
-
thuban
nice
-
JAA
Yep, can confirm, CBP-2016-043045 on 2023-05-26 shows up on page 10 and 11.
-
JAA
2016-05-26*
-
JAA
And there's one duplicate that appears on two different dates. Are you fucking kidding me?
-
JAA
DON-NAVY-2022-000829 appears on 2021-10-21 and 2022-03-01.
-
h2ibot
-
h2ibot
JacksonChen666 edited URLTeam (+613, add go.enderman.ch URL shortener):
wiki.archiveteam.org/?diff=50889&oldid=50878
-
h2ibot
Contributor edited List of websites excluded from the Wayback Machine (-41, added Finnish-language 'perunamaa.net'; sorted…):
wiki.archiveteam.org/?diff=50890&oldid=50766
-
h2ibot
JustAnotherArchivist edited List of websites excluded from the Wayback Machine (+231, Add note about bot):
wiki.archiveteam.org/?diff=50891&oldid=50890
-
JAA
foiaonline.gov/foiaonline/action/pu…ber=EPA-R5-2013-010234&type=Request has so many records that the web interface doesn't even display the number.
-
JAA
The API returns `recordsTotal: 99999` but who knows whether that's true.
-
JAA
Seems unlikely. :-P
-
JAA
Or perhaps that's the limit of their system. Wouldn't even surprise me.
-
h2ibot
JAABot edited List of websites excluded from the Wayback Machine (+0):
wiki.archiveteam.org/?diff=50892&oldid=50891
-
» JAA wonders whether we should keep a list of websites *formerly* excluded from the WBM.
-
JAA
2023-09-24 14:58:54.582Z INFO request:EPA-R5-2013-010234:Request Got 1827 records
-
JAA
It did not have 99999 records.
-
JAA
lol, once you get near the end of the pagination, the API does return 1827.
-
JAA
I'm glad I didn't rely on that number at all in my code.
-
jacksonchen666
.quit
-
imer
added an advanced usage section with ipv6 instructions to the running projects with docker page on the wiki which needs approval please :)
-
h2ibot
Imer edited Running Archive Team Projects with Docker (+8056, Added Advanced usage section w/ ipv6 guide):
wiki.archiveteam.org/?diff=50893&oldid=50373
-
h2ibot
JustAnotherArchivist changed the user rights of User:Imer
-
imer
oh, thats me!
-
imer
thanks JAA
-
JAA
*record scratch*
-
h2ibot
Imer edited Running Archive Team Projects with Docker (+2078, Advanced usage: Resource control via…):
wiki.archiveteam.org/?diff=50894&oldid=50893
-
imer
thats all for now, if someone wants to give that a read over for bad wording and such that'd be great
-
imer
the URL template thingy links to www.webcitation.org which has an ssl error as the cert is only valid for non-www btw, not sure where to fix that
-
JAA
They used to have a cert valid for both. Ugh...
-
JAA
So now lots of links across the web to their snapshots are broken.
-
JAA
<palpatine_ironic.png>
-
fireonlive
sheesh lol
-
fireonlive
i get that www. is outdated but at least redirect it if you used it before ;)
-
h2ibot
JustAnotherArchivist edited Template:Url (-4, WebCite messed up their TLS certs in early 2022…):
wiki.archiveteam.org/?diff=50895&oldid=50604
-
JAA
I wonder wtf this cert is about:
crt.sh/?id=9706125760
-
fireonlive
oh that reminds me i had a dream that crt.sh was shutting down and i was kinda sad
-
fireonlive
@_@
-
fireonlive
hmm amazon issued… wonder if it’s one of their load balancer thingies
-
fireonlive
kinda like cloudflare and their shared san certs
-
JAA
Fortunately, crt.sh is 'just' one mirror of the (aggregated) CT logs, but yeah.
-
fireonlive
mm, and it’s open source so i guess you could set up your own postgres server that outputs html if you really wanted to
-
JAA
Yeah. I'm sure it requires a decent degree of masochism though.
-
fireonlive
haha for sure
-
that_lurker
And if you check the cert for the domain medicine20.com its a wildcard cert for *.jmirx.org
-
JAA
I quickly checked what's in the WBM in terms of actual CT logs. There's a fair bit, and the ones I sampled come from
archive.org/details/certificate-transparency , which I understood to be web crawls seeded by CT data, not CT logs themselves. The number of views on those items also make it unlikely that they're only CT logs. So I wonder how complete that is.
-
JAA
Also, the key resource for actual CT logs is
github.com/google/certificate-trans…cba07cac8/docs/google/known-logs.md (and RFC 6962 for the API definition).
-
h2ibot
-
h2ibot
-
that_lurker
Would streaming the cert transparency logs to a log with daily rotation + upload be the best way to archive them
-
Barto
fireonlive: there's an open pgsql db behind crt.sh, so if they ever shut down, it will be an easy job to dump maybe :-)
-
fireonlive
ah yes that too :3
-
fireonlive
:)
-
JAA
As long as they don't shut down without announcement, but that seems unlikely given how popular the site is.
-
Barto
-
fireonlive
they linked to something called “getindie.wiki” which could be an interesting source:
twitter.com/MinecraftWikiEN/status/1706004084497502439
-
eggdrop
-
fireonlive
oops right, nitter share box copies a twitter url lol
-
Barto
:-)
-
JAA
FOIAonline download is looking fine so far. 20% done. Projected size is only 1.8 TiB now, but that might change since I didn't randomise the list of requests and am therefore going through them in chronological order.
-
h2ibot
JustAnotherArchivist created FOIAonline (+555, Created page with "{{Infobox project | URL =…):
wiki.archiveteam.org/?title=FOIAonline
-
h2ibot
JustAnotherArchivist edited Deathwatch (-221, /* 2023 */ Link to FOIAonline page):
wiki.archiveteam.org/?diff=50899&oldid=50879
-
JAA
That's after about 18 hours, so should easily finish in time.