-
Macsteel
Hello, I'm here for a suggestion.
-
thuban
fire away
-
Macsteel
BD25.eu was a rich Usenet index with the largest Bluray ISO catalog.
-
Macsteel
The site shut down, but someone made a 42gig archive of just NZBs.
-
Macsteel
These NZBs are for all Bluray ISOs.
-
Macsteel
While you can consider the site "archived," everything on Usenet is prone to retention/DMCA, like the Blurays...
-
Macsteel
Would archiving these ISOs be a project of interest?
-
thuban
theoretically yes; in practice, archiveteam data goes to the internet archive, which is itself subject to dmca and as such frowns on gross piracy.
-
thuban
(but got a link to the archive?)
-
Macsteel
Yeah
-
fireonlive
offhand do you know roughly how big the dataset would be?
-
Macsteel
42 gigs beware. All Bluray NZBs inside.
-
Macsteel
-
fireonlive
ah, could also calculate that from the NZBs themselves
-
Larsenv
Macsteel isn't that on cabal trackers?
-
Larsenv
also nzbstars sucks
-
Macsteel
well releases were "bd25", bd50", "bd100etc". Numbers implying the disc size(?) in ISO format. So anywhere from 25 to 100gigs each.
-
Larsenv
yeah, I'm aware of the scrape, I bet the nzbs are out there
-
Larsenv
afaik if you download a collection of nzbs on sabnzbd it will download every nzb in the file
-
Macsteel
All NZBs are within that NZB. lol
-
Larsenv
yep, so sabnzbd will download everything it sees
-
Larsenv
I'm sure there are people which have downloaded everything, I use eweka personally
-
Larsenv
they have 14+ year retention
-
Larsenv
but remember that most of them probably have the par2 to repair em if articles go down
-
fireonlive
thuban: do you have a usenet setup?
-
Macsteel
Full hoard is petabytes for sure.
-
thuban
not at present
-
JAA
Needs a total size estimate, but yeah, very unlikely that IA would take this.
-
fireonlive
kk i'm (attempting) to pull the nzb's contents
-
fireonlive
can dump those 43GB on IA i suppose
-
JAA
Yeah, that sounds fine.
-
fireonlive
:)
-
Larsenv
fireonlive I do
-
Larsenv
I'm not archiving that though, the 43gb is comprised of nzbs
-
Macsteel
Do you get missing articles on eweka often? I know giganews is practically in bed with california.
-
pabs
-rss/#hackernews- Loss of nearly a full decade of information from early days of Chinese internet:
chinamediaproject.org/2024/05/27/goldfish-memories news.ycombinator.com/item?id=40546920
-
yzqzss
that's ture
-
yzqzss
true
-
pokechu22
That partially feels like an issue of the metadata used for date filtering not existing back then and things not being smart enough to infer based on page text (probably in addition to actual deletion)
-
yzqzss
Although the original author is not good at using search engines, the conclusion is still correct
-
yzqzss
Can be attributed to three reasons (I think): 1. extremely high bandwidth costs 2. Restrictions, censorship, fines and shutdown commands from π«’ 3. Competition from mobile apps
-
yzqzss
For example, Baidu Tieba (or Baidu Post) mentioned in the article chose to delete all posts before 2017 due to increasingly strict censorship requirements (it is costly to re-review all old posts).
-
steering
I can't speak to how much worse it is in China, but it's not like that's uncommon in the rest of the world.
-
steering
It's also costly to maintain those old posts etc.
-
yzqzss
For reason 1: The general price of most CDNs is currently 200 RMB/TB (30 USD/TB).
-
yzqzss
Peking University launched the www.infomall.cn web archive project in 2002, but the project was stopped around 2010. (Peking University still keeps these data, about 300TB.)
-
yzqzss
steering: bad world πΆβπ«οΈ
-
h2ibot
-
fireonlive
hopefully the world ends soon
-
GrooveKeeper
Hi there, Is there any plans to archive mixes db? The website is shutting down at the end of this month?
mixesdb.com/w/MixesDB:Shutdown there are dumps at the bottom. The important part is the info about each mix and possibly the audio as well?
-
that_lurker
Grabbing the wiki. Needed a bit of investigation so doing it locally. Could maybe be good to run that on AB as well so it will end up in WB
-
GrooveKeeper
i have got a warrior vm. i am not sure if there are audio files in the dumps, having a look now to see
-
that_lurker
Good on the site maintainer for porviding dumbs
-
that_lurker
GrooveKeeper: The audio is most likely in Soundcloud.
-
GrooveKeeper
im trying to see if i can get jdownloader to grab the 9 part files
-
that_lurker
oh they have multiple outside sources on the audios. Some are on soundcloud, mixtube and youtube. Most likely others too
-
GrooveKeeper
that_lurker there are some pages that audio directly on them such as
mixesdb.com/w/2006-08-15_-_Above_%2…nfold_-_Trance_Around_The_World_126 but looking closely, they apperer to be hosted on archive.org
-
GrooveKeeper
this site might be quite easy to archive
-
that_lurker
Yeah Mediawikis tend to be. I or someone else will run it in archivebot onces the pending queue clears up so the site will be in the wayback machine as well
-
that_lurker
of course huge thanks also go to the maintainer for releaseing that page with all the links and such
-
GrooveKeeper
well i have got a warrior running, i do notice it seams to often do telegram. i am not sure that because that's the highest priory or due to so much to archive?
-
GrooveKeeper
thank you for the pointers.
-
GrooveKeeper
a sociality with history, is a society without a future
-
GrooveKeeper
a sociality without history, is a society without a future
-
imer
GrooveKeeper: telegram needs tons of workers due to their rate limiting and there's lot of work, thats why it's usually the auto-choice :)
-
GrooveKeeper
ah thank you
-
that_lurker
Hmm. Mixesdb site is down it seems
-
GrooveKeeper
i think mixes db is being hoarded to death which is it seems to be showing 403 forbidden message
-
that_lurker
I'll check up on it every now and then and start the grab once it becomes stable again.
-
GrooveKeeper
no worries are you grabbing the dump or are you archiving the pages into wayback?
-
that_lurker
the page to wb and also if possible the entire wiki with
github.com/saveweb/wikiteam3 to Internet Archive as well
-
GrooveKeeper
i think a lot of people are grabbing the dump files, it's funny how a page that had become too complex to maintain or closes due to lack of use, and is then leached to death as soon as they announce closure
-
that_lurker
Allowing the download of large files witout rate limiting tends to do that
-
GrooveKeeper
that_lurker so thats saving the web pages onto archive.org?
-
GrooveKeeper
if that could be added to a warrior with rate limiting, then its something that can be run and if new edits come in, they can get backed up onto archive.org
-
that_lurker
GrooveKeeper: wikiteam3 is the one that save the wiki to
archive.org/details/wikiteam But #archivebot is the one that grabs sites to the Wayback Machine
-
that_lurker
GrooveKeeper: Archivebot can easily handle that site. Warrior would be too many connections and most likely ddos the site.
-
GrooveKeeper
ah fair play. so instead of using warrior, which i thought was the way websites are crawled and uploaded onto archive.org. You use something else ie #archivebot to save mixes db which is something 1 person can run?
-
that_lurker
Warrior project are site targeted projects. Archivebot is best explained (at least better than I can) in here
wiki.archiveteam.org/index.php?title=ArchiveBot
-
GrooveKeeper
wow, thank you having a read up.
-
that_lurker
There is a lot of information in that wiki
-
GrooveKeeper
chears
-
yzqzss
chouti_comments done !
-
GrooveKeeper
now i know why people start homelabs. start with a single file server, then build a lowered desktop just for running warrior and archivebot
-
masterx244|m
Forgot to deploy temp warriors at the GPN in karlsruhe. Perfect internet there (fiber to the table) and each device there gets a public ip (yes, you need to firewall your device yourself, the LAN there is a full part of the internet)
-
» that_lurker drools
-
masterx244|m
Its a sister event of the well known ccc congress. Ccc in germany = expect better internet than elsewhere in the country
-
that_lurker
I really need to attend ccc some year
-
masterx244|m
They get datacenter-grade network setup for a few days pretty quick (and a few years ago when twitcb had a false-positive nipple detection on the revision demoparty they had the streams set up as replacement in 10 minutes (they were quicker than twitch support for a featured event without prior announcement))
-
that_lurker
Travel to Murican events cost too much, but it's cheap to go to Germany from Finland
-
Macsteel
-
thuban
"nipple detection" would be a good band name
-
masterx244|m
that_lurker: And the congress is not as commercial as defcon & friends since the main orgsnizer is a nonprofit (and thats not as easy in germany as in the us)
-
yzqzss
-
yzqzss
(standard csv format, commas and quotes escaped)
-
yzqzss
13623632 urls, have a good day :)
-
fireonlive
yzqzss++
-
eggdrop
[karma] 'yzqzss' now has 5 karma!
-
JAA
'Standard' CSV, good one! :-)
-
Macsteel
followup on bd25? no bouncer
-
fireonlive
i have the nzb of the nzbs; attempting to upload it to IA but ran into an issue
-
fireonlive
well, the content of the nzb of the nzbs
-
Macsteel
cool
-
Macsteel
if you mean iso's then it was said IA may reject
-
Macsteel
thank you for the interest
-
Macsteel
and bandwidth!
-
fireonlive
:)
-
fireonlive
just the 7z.### and par2s so far
-
Macsteel
rar pw's were 0-999 easy crack if there's no list but don't waste time doing it
-
Macsteel
*001-999
-
fireonlive
ah i haven't ventured deeper
-
fireonlive
do you mean the BD25.part01.rar BD25.part02.rar BD25.part03.rar?
-
fireonlive
there was g8ted for the initial .7z
-
Macsteel
no that's after each nzb is fetched individually.
-
Macsteel
the film itself
-
fireonlive
ahh, they're in passworded rars?
-
fireonlive
is the password for them in the individual NZBs? or documented somewhere?
-
fireonlive
going back over my items and updating some metadata i left out...
-
fireonlive
zzz
-
fireonlive
π§Ή
-
Macsteel
it was in the index's search results
-
fireonlive
ahh, so potentially different every time?
-
fireonlive
is there a backup of the passwords anywhere?
-
Macsteel
001 to 999 consistently
-
Macsteel
I don't know about a backup if there aint a list in there
-
fireonlive
oh,
-
fireonlive
i see what you mean - they were always 3 digits from 001 to 999 for the passwords
-
fireonlive
gotcha
-
Macsteel
Correct
-
fireonlive
:)
-
Macsteel
1. the big fat NZB with a gorillion NZBs (you are here)
-
Macsteel
2. each nzb downloads *.rar
-
Macsteel
3. each rar's pw is 001 to 999
-
fireonlive
gotcha