-
h2ibotFlashfire42 edited List of websites excluded from the Wayback Machine (+31): wiki.archiveteam.org/?diff=50285&oldid=50272
-
h2ibotPaulWise edited Mailman2 (+70, more lists, deduplicate lists): wiki.archiveteam.org/?diff=50286&oldid=50214
-
h2ibotJAABot edited List of websites excluded from the Wayback Machine (+0): wiki.archiveteam.org/?diff=50287&oldid=50285
-
fireonlive"VirtualBox 7.0.10 download links have disappeared" news.ycombinator.com/item?id=36841272
-
fireonlivestill on the download mirror though
-
imerdownload.virtualbox.org/virtualbox that'd be quite a chunk to grab
-
imerprobably
-
fireonlivehmm i guess some stuff is broken; apge hasn't been modified according to the wiki history in 10 months
-
fireonlivebut a lot of stuff doesn't make sens eon the site lol
-
fireonlivee.g. changelog is blank too virtualbox.org/wiki/Changelog-7.0
-
fireonlive"[Include(wikitestbuildsfile:changelog-7.0.wiki, text/x-trac-wiki)]]"
-
fireonlivei guess their trac just isn't happy
-
fireonlivevia virtualbox.org/wiki/Changelog-7.0?action=diff&version=2, not sure how to view a page source otherwise
-
fireonlivefalse alarm i guess :)
-
vokunal|mOn the youtube 144p idea, for a while, yt-dlp has a worstvideo setting and bestaudio setting, which I used to use to make sure I at least had the video in some quality, but the audio was still perfectly useable. Might be interesting if this idea gets tossed around a bit more
-
flashfire42|mDonate a bunch to IA and suggest good channels to the current YouTube archival stuff
-
fireonliveanyone have a few million burning a hole in pockets
-
h2ibotFlashfire42 edited List of websites excluded from the Wayback Machine (+24): wiki.archiveteam.org/?diff=50288&oldid=50287
-
h2ibotJAABot edited List of websites excluded from the Wayback Machine (+0): wiki.archiveteam.org/?diff=50289&oldid=50288
-
that_lurkertwitter rebranding as x will break a metric fuckton of links and embeds everywhere unless they still keep twitter.com and just redirect.
-
DigitalDragonstwitter project time?
-
DigitalDragonsor is it still locked down to accounts only
-
that_lurkerstill locked down to accounts only
-
DigitalDragonsugh
-
that_lurkergood thing is that they seems to only be doing a domain swap to x.com, so everything should maybe hopefully if the start are aligned somewhat good stay the same
-
DigitalDragonsI assume that they'll probably only use x.com for the frontend and keep twitter.com in the backend
-
DigitalDragons(like how discord's cdn is still on discordapp.com and such)
-
DigitalDragonsi doubt they have the dev bandwidth to do a full domain swap
-
that_lurkerwe shall hope
-
SennatonNew here, hello.
-
that_lurkerSennaton: Hello. You should also join #archiveteam-ot for somewhat off topic conversations
-
SennatonK, thanks.
-
trainingdataI run an academic open large language model project hplt-project.org and am looking for more training data. We have 10 petabytes of spinning disks attached to high-performance compute and a deal with the Internet Archive for 7 petabytes of WARC, mainly WIDE*. While I appreciate that archivebot_go has publicly downloadable WARC, is it
-
trainingdatapossible to get access to Archive Team: URLs WARCs? For example archive.org/download/archiveteam_urls_20230720203029_3f55fb2a is not downloadable.
-
thubanarkiver: i think this is your area ^
-
pabsI sometimes need to parse WARCs to check what was missed by AB jobs, have been resorting to hacky shell so far but want to do something better
-
pabswhat libraries are recommended for WARC parsing? preferably with Python bindings
-
Maakuth|mpabs, wiki.archiveteam.org/index.php/The_WARC_Ecosystem are you aware of this page?
-
pabsnope, thanks
-
pabshmm, lots of unmaintained stuff
-
Maakuth|mwarcat seems promising, even though only one author
-
JAAwarcio is acceptable for WARC parsing/reading, just don't ever write WARCs with it.
-
JAAI've been working (on and off) on a new Python package with a more solid core, but it's not usable yet.
-
pabsthe reason I wanted this is to better automate what I did today: discovering open directory indexes/trees that were missed and or contents partially missed
-
pabsanything like that exist yet?
-
randomHello, anyone have the Forward DNS (FDNS) of Project Sonar saved?
-
h2ibotExorcism edited DokuWiki (+137): wiki.archiveteam.org/?diff=50290&oldid=49786