-
mgrandi
Ty
-
jodizzle
mgrandi: Thanks for the wiki advice, but it looks like there was actually a dump of the wiki recently:
archive.org/download/wiki-corebootorg.
-
jodizzle
Of course, I went to the trouble of setting up the wikiteam tools before I realized that
-
jodizzle
But I guess I have them for the future.
-
mgrandi
Cool, glad it worked out
-
wickedplayer494
hah I came across flashkit and thought "oh this would be something good for archivebot" and then I go and look at the viewer then dashboard and see that we're already on it
-
wickedplayer494
nice.
-
sol
Hello. Is anyone here archiving the NY Review of Books? All articles are open to (registered) non-subscribers until November 5th.
-
sol
It's a Wordpress-based site. But I'm having difficulty getting wget and httrack to work with the cookies.txt I supply. I feel they might both be downloading the "logout" link, despite my trying to exclude them from doing so.
-
sol
And should this ^^^ discussion go here or in #archiveteam-ot?
-
jodizzle
It should go here, I think
-
jodizzle
sol: Don't have much time to look into this atm, but you may want to try grab-site instead:
github.com/archiveteam/grab-site
-
jodizzle
You can provide cookies, and also you get everything saved in WARCs
-
sol
Thanks. I'll take a look at grab-site today.
-
sol
So, doing `grab-site
nybooks.com --wpull-args=--load-cookies=/home/sol/gs-venv/cookies.txt` should work?
-
sol
How do I check if grab-site is downloading the full articles available to registered users or only the partial articles available to unregistered users?
-
katocala
-
katocala
well...
-
Lord_Nightmare
I just made an observation about ripping certain CDs: some CDs are encoded in a special/weird format called "HDCD" which allows dynamic range and other trickery to effectively store 20-bit audio using a 16-bit cd, but part of the data encoded is stored in the CD subchannel data
-
phuzion
Lord_Nightmare: Are you talking about SACDs?
-
Lord_Nightmare
so a straight audio rip of the CD loses this data and the HDCD decoder, without that data, has to guess at the multiplier for the dynamic range
-
phuzion
Or is that something completely different?
-
Lord_Nightmare
no, HDCD, which is a bit of a hack
-
phuzion
Interesting.
-
Lord_Nightmare
-
phuzion
Yeah I'm reading up on it now
-
Lord_Nightmare
-
Lord_Nightmare
this means that ripping any HDCD cd *MUST INCLUDE* the subchannel data to be an accurate rip
-
phuzion
I'm vaguely familiar with SACDs, and actually quite familiar with redbook CDs, so this is new and interesting.
-
Lord_Nightmare
so it has to be ripped as bin/cue with subchannels enabled
-
Lord_Nightmare
which is different from pretty much ANY OTHER audio cd
-
phuzion
I wonder if EAC can rip it
-
Lord_Nightmare
now, there are some audio cds which have audio in the pregaps
-
Lord_Nightmare
which is technicaly against the spec
-
Lord_Nightmare
they're supposed to be 2 seconds of silence encoded as audio
-
Lord_Nightmare
a proper cd ripper should rip those too
-
Lord_Nightmare
does archiveteam have a wiki about ripping CD data?
-
Lord_Nightmare
this should be noted there
-
Lord_Nightmare
if possible
-
phuzion
Lord_Nightmare: It's probably not on the AT wiki, but it might be on fileformats
-
phuzion
-
Lord_Nightmare
the subchannel data on audio CDs uses the P and Q channels to hold the timecode data (as would be displayed on the LCD of a diskman etc) and should be a consistent format and timing from the end of the pregap, barring any offset craziness
-
JAA
EAC and whipper definitely cover pre-gaps.
-
Lord_Nightmare
but on these HDCDs, I guess the remaining 6 subchannels are used to store other data
-
Lord_Nightmare
how subchannels are actually stored is WEIRD AS HELL
-
Lord_Nightmare
byuu had a series of tweets, which they have since deleted, about how EFM (
en.wikipedia.org/wiki/Eight-to-fourteen_modulation ) works and how the 'noise-whitened' bits etc are stored as pits/lands on the cd surface
-
Lord_Nightmare
there was also a lot of discussion about this on the domesday duplicator discord
-
JAA
We need a KryoFlux-like thing for CDs. :-|
-
Lord_Nightmare
since EFM is also used on laserdiscs
-
Lord_Nightmare
...and the domesday guys observed that many lasterdisc players have RF debug pickups from the laser head
-
JAA
I suppose a microscope should work in principle.
-
Lord_Nightmare
which is also used to read CDs
-
Lord_Nightmare
in the LD player
-
Lord_Nightmare
and the CDs can be raw dumped the same way LDs can
-
Lord_Nightmare
so yes there *IS* a kryoflux thing for CDs, though you may need to patch the laserdisc drive firmware to make it play the entire CD sequentially from lead-in/TOC to lead-out
-
Lord_Nightmare
then decode the resulting 300GB analog file
-
Lord_Nightmare
laserdiscs are analog/PWM data stored as pits/lands, while cds are entirely digital, so the domesday duplicator board, which is meant for reading LDs, is storing a whole hell of a lot of redundant/unnecessary data
-
Lord_Nightmare
later laserdiscs which have digital/stereo audio use EFM encoding for that, similar to CDs
-
JAA
Oof
-
balrog
someone needs to reverse-engineer MakeMKV's "libredrive" crap
-
balrog
there are a bunch of techniques that basically read a sector or a few, dump the drive cache, extract the pre-decoded EFM from it
-
balrog
rawdump/friidump for GC/Wii works like that
-
balrog
it is slow
-
balrog
it's not quite as low-level as using a domesday duplicator, but it's more than enough
-
Jean-Fred
Hello! Upping the PSN topic − mgrandi any chance you had a look? I tried getting you these URL list you asked for (via VGPC folks who crawled the site ages ago) − it’s possible but did not get it done so far :-( (also cc lennier1 who raised the topic couple of days ago )
-
jodizzle
sol: I think the only way to be sure would be to examine the contents of the WARC. There are a variety of tools that you can use to help you with that.
-
OrIdow6
-
OrIdow6
<tech234a> GitHub repo is gone but PyPi listing is still up
-
OrIdow6
"DMCA'd" meaning that they're basically trying to remove all public copies
-
OrIdow6
(First two from -ot)
-
tech234a
-
JAA
Oof
-
OrIdow6
Now to find all the issue trackers etc.
-
Wayward
updates don't work either
-
JAA
Issues will be gone.
-
JAA
Unless the repo is restored.
-
JAA
Same with PRs.
-
OrIdow6
Is the #gitgud far along to have gotten it? Haven't been paying attention to that
-
OrIdow6
very much
-
Wayward
One thing I'm not clear on is how these "rolling cipher technical protection measures" that YouTube is said to employ and that youtube-dl is said to circomvent, the main basis of this DMCA claim, is even a thing. What makes youtube-dl any different from a web browser that poles data from YouTube in order to display it or store it in memory, cache, harddisk space?
-
JAA
Doesn't look like the GitHub project got it, unless it happened very recently.
-
Wayward
Every web browser MUST definitionally circomvent these measures
-
JAA
Wayward: Please keep that discussion in -ot.
-
JAA
Discussion about archival here, discussion about the notice and consequences and legal stuff and whatever in -ot.
-
OrIdow6
The second top-level comment in the HN thread has a link to a WBM copy and tells people to be rebels and download it that way; if that keeps up, would not be surprised if an AB warc gets darked or similar
-
OrIdow6
Nonetheless, not much to change on AT's part even if that's the case
-
maxfan8
Wdym by darked?
-
Ajay
I think darked is when it is hidden from the public, but still kept in the archive for future preservation
-
maxfan8
Ah nice, thanks
-
JAA
Good thing that Git is a DVCS, so every dev still has their local copy of nearly the entire repo.
-
maxfan8
Yeah I have a local clone cloned repo of it somewhere
-
JAA
Same, I have a normal clone (i.e. no PRs) from end of September.
-
OrIdow6
Looks like a lot of the issues are still in Google Web Cache
-
OrIdow6
Presumably there will be many copies of the repo - I'm more concerned about the non-code
-
mgrandi
@Jean-Fred: not yet but tonight hopefully, I think if I login I can still see the ps3 stuff on the store which is good (
store.playstation.com/en-us/product…p&smcid=psapp%3Alink%20menu%3Astore )
-
JAA
OrIdow6: Yep, working on that.
-
mgrandi
Youtube dl got dcma-ed
-
mgrandi
-
OrIdow6
Look up
-
mgrandi
Oh yeah
-
mgrandi
People are saying it will probably be restored
-
mgrandi
Should check back to see if it gets put up again so we can pull the issues
-
JAA
Yep, just set up a monitor.
-
mgrandi
So you can't git clone from way back machine copies of the repo? How did that work for the mercurial project, just because HG has a weird wire format that we also scraped?
-
JAA
-
mgrandi
There is a way back machine url from the 18th but you can't seem to git clone it, but it got the tar archives at least.
-
JAA
-
icedice
-
icedice
-
tech234a
Yep just saw that mirror on HN
-
tech234a
Seems to have a commit from today
-
icedice
I have youtube-dlc btw, but only the binary
-
icedice
(youtube-dlc was a fork of youtube-dl that was more up to date on fixing pull requests)
-
JAA
So here's what I have: PyPI including wheels and tars (via AB), website (via AB, not much on there), all versions of the Debian package, bundle of the Debian package repo
-
JAA
Also a bundle of that Gitee mirror.
-
JAA
I ran the Debian package repo through #gitgud as well.
-
JAA
And as a precaution, same for Invidious.
-
JAA
Debian package versions are being downloaded by AB at the moment.
-
JAA
The Google cache attempt quickly failed due to rate limiting. I wanted to spread that across the different AB pipelines.
-
JAA
There are 54k URLs to be retrieved (27k each for issues and PRs), and it needs a high delay, so it'd take ~25 days on a single AB job...
-
JAA
Arch Linux package archive done via AB.
-
Ajay
maybe run newpipe through gitgud if that's not already done?
-
JAA
-
Barto
JAA: that's for all that job
-
JAA
Ajay: Done and bundle grabbed.
-
tech234a
Does anyone know if notices are posted to the DMCA repo before the repos are made inaccessible?
-
tech234a
-
hexa-
-
tech234a
Huh so they’re going after contributors also
-
OrIdow6
hexa-: See the frantic discussion above
-
tech234a
The link mentions contributors are facing legal action
-
OrIdow6
Oh, didn't see the context param
-
OrIdow6
Sorry hexa-
-
icedice
-
icedice
^ not very smart move
-
JAA
icedice: You clearly didn't read what that is about. Also, unrelated.
-
icedice
I thought they were saying that it's reuploaded to Archive.org in the initial post
-
icedice
I might have been mistaken
-
JAA
It's not even a DMCA notice. It's an explanation why GitHub makes the notices public.
-
JAA
Or well, a quote about that.
-
icedice
-
Ajay
icedice: no, they are saying that those are alternative places the repository could be hosted
-
icedice
archiving to
-
icedice
Not from
-
Ajay
ooh, sorry I misread
-
Ajay
to migrate issues, I think you need to be an admin of the repository
-
Ajay
for gitea and gitlab
-
nico_32
i have youtube-dl repo from commit d65d89183f645a0e95910c3861491a75c26000eb // Thu Sep 24 07:36:38 2020 +0700
-
nico_32
so the whole history
-
nico_32
git is really nice for that
-
JAA
Funny, same commit I also have on a local clone. But also, there are full copies on GitHub and Gitee up to today.
-
nico_32
DMCA takedown make me keep local copy of everything
-
nico_32
mmorpg emulator, fanfiction, mashup
-
nico_32
everything
-
Ryz
Loot all the things, looooooooooot
-
nico_32
dozen of po on local nas are cheap
-
nico_32
s/po/to/
-
nico_32
sorry
-
nico_32
would be nice to have a few po :)