00:03:48 Flashfire42 edited List of websites excluded from the Wayback Machine (+31): https://wiki.archiveteam.org/?diff=50285&oldid=50272 00:33:54 PaulWise edited Mailman2 (+70, more lists, deduplicate lists): https://wiki.archiveteam.org/?diff=50286&oldid=50214 01:00:58 JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=50287&oldid=50285 01:15:06 "VirtualBox 7.0.10 download links have disappeared" https://news.ycombinator.com/item?id=36841272 01:15:27 still on the download mirror though 01:18:00 https://download.virtualbox.org/virtualbox/ that'd be quite a chunk to grab 01:18:06 probably 01:18:12 hmm i guess some stuff is broken; apge hasn't been modified according to the wiki history in 10 months 01:18:19 but a lot of stuff doesn't make sens eon the site lol 01:18:29 e.g. changelog is blank too https://www.virtualbox.org/wiki/Changelog-7.0 01:19:26 "[Include(wikitestbuildsfile:changelog-7.0.wiki, text/x-trac-wiki)]]" 01:19:46 i guess their trac just isn't happy 01:20:49 via https://www.virtualbox.org/wiki/Changelog-7.0?action=diff&version=2, not sure how to view a page source otherwise 01:20:52 false alarm i guess :) 04:08:50 On the youtube 144p idea, for a while, yt-dlp has a worstvideo setting and bestaudio setting, which I used to use to make sure I at least had the video in some quality, but the audio was still perfectly useable. Might be interesting if this idea gets tossed around a bit more 04:23:46 Donate a bunch to IA and suggest good channels to the current YouTube archival stuff 04:32:21 anyone have a few million burning a hole in pockets 06:12:53 Flashfire42 edited List of websites excluded from the Wayback Machine (+24): https://wiki.archiveteam.org/?diff=50288&oldid=50287 07:00:08 JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=50289&oldid=50288 07:20:01 twitter rebranding as x will break a metric fuckton of links and embeds everywhere unless they still keep twitter.com and just redirect. 07:21:26 twitter project time? 07:21:37 or is it still locked down to accounts only 07:23:45 still locked down to accounts only 07:24:15 ugh 07:25:47 good thing is that they seems to only be doing a domain swap to x.com, so everything should maybe hopefully if the start are aligned somewhat good stay the same 07:31:15 I assume that they'll probably only use x.com for the frontend and keep twitter.com in the backend 07:31:35 (like how discord's cdn is still on discordapp.com and such) 07:31:45 i doubt they have the dev bandwidth to do a full domain swap 07:32:49 we shall hope 07:50:55 New here, hello. 07:53:14 Sennaton: Hello. You should also join #archiveteam-ot for somewhat off topic conversations 07:53:34 K, thanks. 08:56:03 I run an academic open large language model project https://hplt-project.org/ and am looking for more training data. We have 10 petabytes of spinning disks attached to high-performance compute and a deal with the Internet Archive for 7 petabytes of WARC, mainly WIDE*. While I appreciate that archivebot_go has publicly downloadable WARC, is it 08:56:04 possible to get access to Archive Team: URLs WARCs? For example https://archive.org/download/archiveteam_urls_20230720203029_3f55fb2a is not downloadable. 08:58:24 arkiver: i think this is your area ^ 12:40:11 I sometimes need to parse WARCs to check what was missed by AB jobs, have been resorting to hacky shell so far but want to do something better 12:40:42 what libraries are recommended for WARC parsing? preferably with Python bindings 12:42:02 pabs, https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem are you aware of this page? 12:43:09 nope, thanks 12:46:02 hmm, lots of unmaintained stuff 12:48:28 warcat seems promising, even though only one author 12:49:31 warcio is acceptable for WARC parsing/reading, just don't ever write WARCs with it. 12:50:47 I've been working (on and off) on a new Python package with a more solid core, but it's not usable yet. 13:03:44 the reason I wanted this is to better automate what I did today: discovering open directory indexes/trees that were missed and or contents partially missed 13:03:52 anything like that exist yet? 15:52:50 Hello, anyone have the Forward DNS (FDNS) of Project Sonar saved? 18:58:33 Exorcism edited DokuWiki (+137): https://wiki.archiveteam.org/?diff=50290&oldid=49786