00:06:14 -_- 00:27:31 From another 1% sample, I got a size estimate of 2.6 TiB. Shows how uncertain that is due to the long tail of requests with many/large records. 00:28:23 s/another/a/ I guess, the previous one was 1‰, not 1%. 00:52:21 PaulWise edited IRC/Logs (+181, add more, done copyleft.org ones): https://wiki.archiveteam.org/?diff=50885&oldid=50312 00:56:22 PaulWise edited Mailman2 (+246, add more, done copyleft.org ones): https://wiki.archiveteam.org/?diff=50886&oldid=50882 01:27:32 PaulWise edited IRC/Logs (+48, AT IRC logs done :)): https://wiki.archiveteam.org/?diff=50887&oldid=50885 02:48:08 My FOIAonline search bruteforcing got some duplicates. Wat. 02:49:23 I listed the requests by receival date for each day from 2000-01-01, then going through the pagination of the results. 02:49:50 These dupes are all over the place, oldest one is in 2016. Something is pretty broken there. lol 02:50:48 nice 02:51:33 Yep, can confirm, CBP-2016-043045 on 2023-05-26 shows up on page 10 and 11. 02:51:37 2016-05-26* 03:01:51 And there's one duplicate that appears on two different dates. Are you fucking kidding me? 03:01:59 DON-NAVY-2022-000829 appears on 2021-10-21 and 2022-03-01. 08:46:03 Exorcism uploaded File:WikiForge logo.png: https://wiki.archiveteam.org/?title=File%3AWikiForge%20logo.png 14:40:02 JacksonChen666 edited URLTeam (+613, add go.enderman.ch URL shortener): https://wiki.archiveteam.org/?diff=50889&oldid=50878 14:40:03 Contributor edited List of websites excluded from the Wayback Machine (-41, added Finnish-language 'perunamaa.net'; sorted…): https://wiki.archiveteam.org/?diff=50890&oldid=50766 14:44:03 JustAnotherArchivist edited List of websites excluded from the Wayback Machine (+231, Add note about bot): https://wiki.archiveteam.org/?diff=50891&oldid=50890 14:52:09 https://foiaonline.gov/foiaonline/action/public/submissionDetails?trackingNumber=EPA-R5-2013-010234&type=Request has so many records that the web interface doesn't even display the number. 14:52:56 The API returns `recordsTotal: 99999` but who knows whether that's true. 14:53:05 Seems unlikely. :-P 14:55:49 Or perhaps that's the limit of their system. Wouldn't even surprise me. 15:00:06 JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=50892&oldid=50891 15:01:07 * JAA wonders whether we should keep a list of websites *formerly* excluded from the WBM. 15:02:21 2023-09-24 14:58:54.582Z INFO request:EPA-R5-2013-010234:Request Got 1827 records 15:02:26 It did not have 99999 records. 15:06:08 lol, once you get near the end of the pagination, the API does return 1827. 15:06:28 I'm glad I didn't rely on that number at all in my code. 15:16:10 .quit 16:16:25 added an advanced usage section with ipv6 instructions to the running projects with docker page on the wiki which needs approval please :) 17:28:29 Imer edited Running Archive Team Projects with Docker (+8056, Added Advanced usage section w/ ipv6 guide): https://wiki.archiveteam.org/?diff=50893&oldid=50373 17:28:30 JustAnotherArchivist changed the user rights of User:Imer 17:28:39 oh, thats me! 17:28:43 thanks JAA 17:29:27 *record scratch* 17:47:33 Imer edited Running Archive Team Projects with Docker (+2078, Advanced usage: Resource control via…): https://wiki.archiveteam.org/?diff=50894&oldid=50893 17:49:37 thats all for now, if someone wants to give that a read over for bad wording and such that'd be great 17:54:18 the URL template thingy links to www.webcitation.org which has an ssl error as the cert is only valid for non-www btw, not sure where to fix that 18:01:31 They used to have a cert valid for both. Ugh... 18:02:13 So now lots of links across the web to their snapshots are broken. 18:02:25 18:02:35 sheesh lol 18:03:00 i get that www. is outdated but at least redirect it if you used it before ;) 18:03:36 JustAnotherArchivist edited Template:Url (-4, WebCite messed up their TLS certs in early 2022…): https://wiki.archiveteam.org/?diff=50895&oldid=50604 18:03:43 I wonder wtf this cert is about: https://crt.sh/?id=9706125760 18:04:11 oh that reminds me i had a dream that crt.sh was shutting down and i was kinda sad 18:04:22 @_@ 18:04:58 hmm amazon issued… wonder if it’s one of their load balancer thingies 18:05:10 kinda like cloudflare and their shared san certs 18:05:53 Fortunately, crt.sh is 'just' one mirror of the (aggregated) CT logs, but yeah. 18:07:13 mm, and it’s open source so i guess you could set up your own postgres server that outputs html if you really wanted to 18:07:38 Yeah. I'm sure it requires a decent degree of masochism though. 18:08:01 haha for sure 18:21:24 And if you check the cert for the domain medicine20.com its a wildcard cert for *.jmirx.org 18:28:08 I quickly checked what's in the WBM in terms of actual CT logs. There's a fair bit, and the ones I sampled come from https://archive.org/details/certificate-transparency , which I understood to be web crawls seeded by CT data, not CT logs themselves. The number of views on those items also make it unlikely that they're only CT logs. So I wonder how complete that is. 18:29:07 Also, the key resource for actual CT logs is https://github.com/google/certificate-transparency-community-site/blob/d4b7663faa46b7c599d73e79e5efa10cba07cac8/docs/google/known-logs.md (and RFC 6962 for the API definition). 18:41:41 Exorcism edited WikiTeam (+0): https://wiki.archiveteam.org/?diff=50896&oldid=50638 18:41:42 Exorcism edited WikiTeam (+0): https://wiki.archiveteam.org/?diff=50897&oldid=50896 19:11:12 Would streaming the cert transparency logs to a log with daily rotation + upload be the best way to archive them 19:12:20 fireonlive: there's an open pgsql db behind crt.sh, so if they ever shut down, it will be an easy job to dump maybe :-) 19:15:29 ah yes that too :3 19:15:34 :) 19:15:51 As long as they don't shut down without announcement, but that seems unlikely given how popular the site is. 20:46:54 https://nitter.net/MinecraftWikiEN/status/1706004078206103965/ 20:53:13 they linked to something called “getindie.wiki” which could be an interesting source: https://twitter.com/MinecraftWikiEN/status/1706004084497502439 20:53:13 nitter: https://nitter.net/MinecraftWikiEN/status/1706004084497502439 20:53:33 oops right, nitter share box copies a twitter url lol 20:56:07 :-) 21:53:57 FOIAonline download is looking fine so far. 20% done. Projected size is only 1.8 TiB now, but that might change since I didn't randomise the list of requests and am therefore going through them in chronological order. 21:56:18 JustAnotherArchivist created FOIAonline (+555, Created page with "{{Infobox project | URL =…): https://wiki.archiveteam.org/?title=FOIAonline 21:56:19 JustAnotherArchivist edited Deathwatch (-221, /* 2023 */ Link to FOIAonline page): https://wiki.archiveteam.org/?diff=50899&oldid=50879 21:57:14 That's after about 18 hours, so should easily finish in time.