02:11:48 Hello, I'm here for a suggestion. 02:18:04 fire away 02:18:24 BD25.eu was a rich Usenet index with the largest Bluray ISO catalog. 02:18:44 The site shut down, but someone made a 42gig archive of just NZBs. 02:18:58 These NZBs are for all Bluray ISOs. 02:19:24 While you can consider the site "archived," everything on Usenet is prone to retention/DMCA, like the Blurays... 02:19:53 Would archiving these ISOs be a project of interest? 02:22:11 theoretically yes; in practice, archiveteam data goes to the internet archive, which is itself subject to dmca and as such frowns on gross piracy. 02:23:48 (but got a link to the archive?) 02:23:55 Yeah 02:24:11 offhand do you know roughly how big the dataset would be? 02:24:29 42 gigs beware. All Bluray NZBs inside. 02:24:36 https://nzbstars.com/?page=getnzb&action=display&messageid=bEs4UlM1Z1dmawemuG1DxAmwi0v%40spot.net 02:24:58 ah, could also calculate that from the NZBs themselves 02:25:55 Macsteel isn't that on cabal trackers? 02:25:57 also nzbstars sucks 02:27:05 well releases were "bd25", bd50", "bd100etc". Numbers implying the disc size(?) in ISO format. So anywhere from 25 to 100gigs each. 02:27:31 yeah, I'm aware of the scrape, I bet the nzbs are out there 02:27:55 afaik if you download a collection of nzbs on sabnzbd it will download every nzb in the file 02:28:00 All NZBs are within that NZB. lol 02:28:09 yep, so sabnzbd will download everything it sees 02:28:26 I'm sure there are people which have downloaded everything, I use eweka personally 02:28:31 they have 14+ year retention 02:29:12 but remember that most of them probably have the par2 to repair em if articles go down 02:29:30 thuban: do you have a usenet setup? 02:29:50 Full hoard is petabytes for sure. 02:29:54 not at present 02:29:55 Needs a total size estimate, but yeah, very unlikely that IA would take this. 02:30:20 kk i'm (attempting) to pull the nzb's contents 02:30:32 can dump those 43GB on IA i suppose 02:30:49 Yeah, that sounds fine. 02:30:58 :) 03:10:08 fireonlive I do 03:10:21 I'm not archiving that though, the 43gb is comprised of nzbs 03:22:52 Do you get missing articles on eweka often? I know giganews is practically in bed with california. 04:57:27 -rss/#hackernews- Loss of nearly a full decade of information from early days of Chinese internet: https://chinamediaproject.org/2024/05/27/goldfish-memories/ https://news.ycombinator.com/item?id=40546920 05:05:24 that's ture 05:05:30 true 05:28:18 That partially feels like an issue of the metadata used for date filtering not existing back then and things not being smart enough to infer based on page text (probably in addition to actual deletion) 05:36:21 Although the original author is not good at using search engines, the conclusion is still correct 05:47:06 Can be attributed to three reasons (I think): 1. extremely high bandwidth costs 2. Restrictions, censorship, fines and shutdown commands from 🫢 3. Competition from mobile apps 05:52:23 For example, Baidu Tieba (or Baidu Post) mentioned in the article chose to delete all posts before 2017 due to increasingly strict censorship requirements (it is costly to re-review all old posts). 05:54:09 I can't speak to how much worse it is in China, but it's not like that's uncommon in the rest of the world. 05:54:46 It's also costly to maintain those old posts etc. 05:58:50 For reason 1: The general price of most CDNs is currently 200 RMB/TB (30 USD/TB). 06:10:36 Peking University launched the www.infomall.cn web archive project in 2002, but the project was stopped around 2010. (Peking University still keeps these data, about 300TB.) 06:20:33 steering: bad world 😶‍🌫️ 06:43:14 Exorcism edited 抽屉新热榜 (-6): https://wiki.archiveteam.org/?diff=52313&oldid=52294 07:14:45 hopefully the world ends soon 14:25:08 Hi there, Is there any plans to archive mixes db? The website is shutting down at the end of this month? https://www.mixesdb.com/w/MixesDB:Shutdown there are dumps at the bottom. The important part is the info about each mix and possibly the audio as well? 14:37:45 Grabbing the wiki. Needed a bit of investigation so doing it locally. Could maybe be good to run that on AB as well so it will end up in WB 14:43:30 i have got a warrior vm. i am not sure if there are audio files in the dumps, having a look now to see 14:46:01 Good on the site maintainer for porviding dumbs 14:49:08 GrooveKeeper: The audio is most likely in Soundcloud. 14:49:22 im trying to see if i can get jdownloader to grab the 9 part files 14:50:22 oh they have multiple outside sources on the audios. Some are on soundcloud, mixtube and youtube. Most likely others too 14:53:56 that_lurker there are some pages that audio directly on them such as https://www.mixesdb.com/w/2006-08-15_-_Above_%26_Beyond,_Paul_Oakenfold_-_Trance_Around_The_World_126 but looking closely, they apperer to be hosted on archive.org 15:25:18 this site might be quite easy to archive 15:28:54 Yeah Mediawikis tend to be. I or someone else will run it in archivebot onces the pending queue clears up so the site will be in the wayback machine as well 15:29:22 of course huge thanks also go to the maintainer for releaseing that page with all the links and such 15:38:23 well i have got a warrior running, i do notice it seams to often do telegram. i am not sure that because that's the highest priory or due to so much to archive? 15:38:33 thank you for the pointers. 15:38:52 a sociality with history, is a society without a future 15:39:07 a sociality without history, is a society without a future 15:46:06 GrooveKeeper: telegram needs tons of workers due to their rate limiting and there's lot of work, thats why it's usually the auto-choice :) 15:54:21 ah thank you 15:56:18 Hmm. Mixesdb site is down it seems 18:44:15 i think mixes db is being hoarded to death which is it seems to be showing 403 forbidden message 18:45:53 I'll check up on it every now and then and start the grab once it becomes stable again. 18:47:13 no worries are you grabbing the dump or are you archiving the pages into wayback? 18:48:22 the page to wb and also if possible the entire wiki with https://github.com/saveweb/wikiteam3 to Internet Archive as well 18:48:24 i think a lot of people are grabbing the dump files, it's funny how a page that had become too complex to maintain or closes due to lack of use, and is then leached to death as soon as they announce closure 18:49:13 Allowing the download of large files witout rate limiting tends to do that 18:49:36 that_lurker so thats saving the web pages onto archive.org? 18:50:37 if that could be added to a warrior with rate limiting, then its something that can be run and if new edits come in, they can get backed up onto archive.org 18:51:17 GrooveKeeper: wikiteam3 is the one that save the wiki to https://archive.org/details/wikiteam But #archivebot is the one that grabs sites to the Wayback Machine 18:52:17 GrooveKeeper: Archivebot can easily handle that site. Warrior would be too many connections and most likely ddos the site. 18:54:10 ah fair play. so instead of using warrior, which i thought was the way websites are crawled and uploaded onto archive.org. You use something else ie #archivebot to save mixes db which is something 1 person can run? 18:56:12 Warrior project are site targeted projects. Archivebot is best explained (at least better than I can) in here https://wiki.archiveteam.org/index.php?title=ArchiveBot 18:58:01 wow, thank you having a read up. 18:58:32 There is a lot of information in that wiki 18:59:06 chears 19:00:08 chouti_comments done ! 19:08:20 now i know why people start homelabs. start with a single file server, then build a lowered desktop just for running warrior and archivebot 19:14:11 Forgot to deploy temp warriors at the GPN in karlsruhe. Perfect internet there (fiber to the table) and each device there gets a public ip (yes, you need to firewall your device yourself, the LAN there is a full part of the internet) 19:17:35 * that_lurker drools 19:35:40 Its a sister event of the well known ccc congress. Ccc in germany = expect better internet than elsewhere in the country 19:37:21 I really need to attend ccc some year 19:37:42 They get datacenter-grade network setup for a few days pretty quick (and a few years ago when twitcb had a false-positive nipple detection on the revision demoparty they had the streams set up as replacement in 10 minutes (they were quicker than twitch support for a featured event without prior announcement)) 19:37:57 Travel to Murican events cost too much, but it's cheap to go to Germany from Finland 19:38:52 https://vc.gg/blog/36c3-staff-assaulted-me-for-political-reasons.html 19:39:01 "nipple detection" would be a good band name 19:40:39 that_lurker: And the congress is not as commercial as defcon & friends since the main orgsnizer is a nonprofit (and thats not as easy in germany as in the us) 20:35:36 arkiver: https://transfer.archivete.am/ErtBA/chouti_links.id.originalUrl.csv.zst 20:37:26 (standard csv format, commas and quotes escaped) 20:38:23 13623632 urls, have a good day :) 20:43:43 yzqzss++ 20:43:44 -eggdrop- [karma] 'yzqzss' now has 5 karma! 22:18:16 'Standard' CSV, good one! :-) 22:22:25 followup on bd25? no bouncer 22:23:40 i have the nzb of the nzbs; attempting to upload it to IA but ran into an issue 22:23:56 well, the content of the nzb of the nzbs 22:24:01 cool 22:25:26 if you mean iso's then it was said IA may reject 22:27:12 thank you for the interest 22:27:35 and bandwidth! 22:34:34 :) 22:34:47 just the 7z.### and par2s so far 22:44:17 rar pw's were 0-999 easy crack if there's no list but don't waste time doing it 22:44:55 *001-999 22:50:46 ah i haven't ventured deeper 22:51:29 do you mean the BD25.part01.rar BD25.part02.rar BD25.part03.rar? 22:51:43 there was g8ted for the initial .7z 23:00:42 no that's after each nzb is fetched individually. 23:00:58 the film itself 23:04:13 ahh, they're in passworded rars? 23:04:35 is the password for them in the individual NZBs? or documented somewhere? 23:08:07 going back over my items and updating some metadata i left out... 23:08:08 zzz 23:08:15 🧹 23:09:58 it was in the index's search results 23:10:30 ahh, so potentially different every time? 23:10:39 is there a backup of the passwords anywhere? 23:10:49 001 to 999 consistently 23:16:41 I don't know about a backup if there aint a list in there 23:24:29 oh, 23:24:43 i see what you mean - they were always 3 digits from 001 to 999 for the passwords 23:24:47 gotcha 23:24:49 Correct 23:24:52 :) 23:27:12 1. the big fat NZB with a gorillion NZBs (you are here) 23:27:20 2. each nzb downloads *.rar 23:27:25 3. each rar's pw is 001 to 999 23:30:47 gotcha