-
pabs
only a thousand?
-
pabs
$ wc -l todo/archive* | tail -n1
-
pabs
43585 total
-
pabs
includes stuff for SWH/codearchiver tho :)
-
project10
... wtf
-
fireonlive
wat
-
pabs
mostly proactive stuff, but doesn't count some things in a pad
-
pabs
and a few thousand unprocessed tabs and a few thousand unopened #hackernews/etc URLs
-
fireonlive
and now there's #infosec :D
-
JAA
lol
-
pabs
oh, and of course now I'm monitoring AB for code/forges/repos, so thousands of links there, plus I lost a bunch since its just terminal output and OOMs kill that
-
h2ibot
JustAnotherArchivist edited Deathwatch (+0, /* 2024 */ Fix order):
wiki.archiveteam.org/?diff=51649&oldid=51645
-
Fusl
have some old dvd's i found, would like to get those copied with the exact bitstream as on the dvd itself. `dvdbackup` fails to read some blocks and i'd like to do something like i can do with `ddrescue` where i can do multiple passes. does someone here have experience doing that kinda stuff?
-
Fusl
also unsure if here or #archiveteam-ot is a better place for this question
-
pokechu22
I personally use
github.com/SabreTools/MPF since that's what's used for redump.org, but that's more targetted at games (I've used it for some dvd-video magazine coverdiscs though)
-
JAA
My srad.jp qwarc run finished. I had previously mostly posted in #archivebot since that's where the discussion originally happened, so quick summary: I fetched /comment/ID, /submission/ID, and the 'parent' of comments. Some of the latter were also run through AB before the announcement that the service would live on (for now).
-
JAA
There were some errors, as expected on an aging, slow, partially broken site. By and large, it went okay though.
-
JAA
958 items had errors; if anyone wants to investigate those, let me know.
-
JAA
pabs: ^
-
flashfire42
Fusl Redumper
-
Fusl
oh, that looks more promising
-
JAA
Posts (stories, journals, etc.) can't directly be enumerated as far as I could tell, that's why I went via comments. But I suppose posts without comments are not as important, and we do have the recursive AB crawl.
-
pabs
JAA: I'll look at those errors
-
JAA
-
JAA
Should be mostly self-explanatory.
-
pabs
some of them work fine in browser at least, so maybe just !ao <
-
pabs
comment/1 still gives an error
-
pabs
btw, final announcement of srad.jp continuation makes it sound like the site will stay up *for now* but there will be no new stories
srad.jp/story/24/01/31/1253207
-
pabs
and indeed no new stories in feb so far
-
pabs
"the policy was suddenly changed and they decided to start recruiting new hosts without shutting down the servers."
-
pabs
JAA: for the comments, looks like most give 500 Internal Server Error, one 404, but a few 200 OK
-
pabs
(from AB)
-
fireonlive
no new stories... but maybe new bugfixes? :D
-
nicolas17
Fusl: if they are DRM'd DVD-Video, it's possible that some data required to decrypt the files is stored in a special area and not accessible to a plain dd
-
nicolas17
after all they had to prevent a plain bitwise copy from giving you a working DVD
-
nicolas17
JAA: I have a problem... try the "Announcement" link and download the PDF
opensource.samsung.com/uploadList?m…=Dolfin-Browser_v2.0_OpenSource.zip
-
nicolas17
it seems to be encrypted/DRM'd... and IA doesn't even let me upload it
-
fireonlive
00000000 3c 21 2d 2d 20 49 4e 43 4f 50 53 20 53 45 43 55 |<!-- INCOPS SECU|
-
fireonlive
00000010 2d 44 52 4d 20 2d 20 56 65 72 20 31 2e 30 20 2d |-DRM - Ver 1.0 -|
-
fireonlive
00000020 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 3e ac 12 02 00 24 |---------->....$|
-
fireonlive
o_O
-
fireonlive
for *checks notes* Dolfin-Browser_v2.0_OpenSource.zip's announcement pdf
-
nicolas17
in fact
-
nicolas17
-
nicolas17
2.2 is also DRM'd... with a *different tool*?!
-
fireonlive
"INCOPS SECU-DRM" seems to be mentioned on some... russian hard drive forums?
-
fireonlive
(or sections of russian forums)
-
Fusl
nicolas17: they're self-made dvds, no drm or anything, just plain old dvd-r
-
fireonlive
i think it's
en.fasoo.com/strategies/enterprise-drm (via fasso mentioned on
wasm.in/threads/incops-secu-drm-ver-1-0.20731) iow they accidentally uploaded the encrypted version of the file
-
fireonlive
(or didn't disable their drm thing for that file)
-
fireonlive
weird lol
-
nicolas17
Fusl: basic use of ddrescue is "ddrescue /dev/cdrom image.iso image.map", the image.map file is optional but highly recommended, ddrescue will save state there (what was recovered, what failed and is pending retry, etc) letting you resume where it left off later
-
h2ibot
Usernam edited List of websites excluded from the Wayback Machine/Partial exclusions (+63):
wiki.archiveteam.org/?diff=51650&oldid=51612
-
pabs
-
audrooku|m
Quora has always been hot liquid garbage
-
Fusl
nicolas17: i tried ddrescue, it keeps telling me `ddrescue: /dev/sr0: Unaligned read error. Is sector size correct?` and then makes the dvd drive sad, telling me there is no disk inserted anymore and refusing to eject the disk when i try to
-
Fusl
i was able to get ddrescue to work to dump all those dvds, then used dvdbackup to extract from the image file and then ffmpeg to convert them into mp4's
-
h2ibot
CreaZyp154 edited URLTeam/Warrior (-81, Removed gray background for is.gd and v.gd as…):
wiki.archiveteam.org/?diff=51651&oldid=51582
-
h2ibot
CreaZyp154 edited URLTeam (+154, play.st and playst.cc):
wiki.archiveteam.org/?diff=51652&oldid=51584
-
h2ibot
-
aninternettroll
hi, a bit offtopic, but does anyone know if on the wayback machine i can see a log of latest urls saved for a given domain? I saved a url today, but i forgot it
-
aninternettroll
and it's not on
web.archive.org/web/*/samarbeid.digdir.no* under URLs as far as i can tell
-
JAA
nicolas17: Ew, fun... Yeah, IA has some measures against that. Might be worth contacting Samsung about it via
opensource.samsung.com/requestInquiry after ensuring everything else is covered.
-
JAA
aninternettroll: I think the prefix search might use a different index, so maybe check again in a day or two.
-
aninternettroll
ok, thanks
-
that_lurker
-
that_lurker
o7
-
arkiver
nicolas17: on IA problems, please ping me as well
-
arkiver
what is the problem?
-
Darken
Could someone archive
sites.google.com/view/crwfam with archivebot for me, has no coverage at all
-
JAA
(Done)
-
arkiver
JAA: i believe AB (used to)? processes sitemaps
-
JAA
arkiver: Yes
-
arkiver
for the cinezen.hk job, it did not find the sitemaps under sitemap.xml it seems (it was not listed in robots.txt, but pretty obvious)
-
JAA
It tries /sitemap.xml and extracts the ones from /robots.txt, too.
-
arkiver
is the CDATA stuff not supported maybe?
cinezen.hk/sitemap.xml
-
JAA
That seems plausible.
-
arkiver
right
-
arkiver
do i !a < list with the URLs from the sitemaps manually to handle this?
-
JAA
Possibly, !a < has quirks. If none of the URLs have any additional path segment, it should be fine.
-
JAA
If you have a list, I can check before submission.
-
pokechu22
ArchiveBot didn't handle
ann-britt.se/sitemap.xml correctly due to whitespace before/after the loc elements (but the sub-sitemaps didn't have that issue so I could do an !a < list of those)
-
pokechu22
I'm pretty sure archivebot's fine with CDATA - the issue there is instead that
cinezen.hk/sitemap.xml links to
cinezen.hk/addl-sitemap.xml and www is considered offsite from non-www
-
pokechu22
so !a
cinezen.hk should work properly as then
cinezen.hk/sitemap.xml is used and that links to www etc
-
pokechu22
(
cinezen.hk also redirects to
cinezen.hk so !a
cinezen.hk wouldn't recurse more than 1 level either)
-
arkiver
pokechu22: very interesting, thank you
-
arkiver
pokechu22: i put it in
-
arkiver
pokechu22: it's working! thank you :)
-
JAA
Oh, yeah, that makes sense. :-)
-
pokechu22
It's confusing because generally
cinezen.hk/sitemap.xml would also redirect to
cinezen.hk/sitemap.xml but in this case it didn't
-
JAA
The lack of redirection actually makes the issue a bit more obvious though than it would be otherwise.
-
h2ibot
-
h2ibot
Pokechu22 edited Jira (-34, /* Not yet archived */…):
wiki.archiveteam.org/?diff=51655&oldid=51654
-
nicolas17
arkiver: a pdf from opensource.samsung is DRM'd for some strange reason, and IA seems to check if .pdf files have valid format, so it's not letting me upload it
-
nicolas17
obviously that file is useless as-is
-
nicolas17
but if imgur gives us a corrupted .png, that doesn't stop us from preserving it, right? :P
-
h2ibot
Pedrosso edited Steam (+725, Added WIP Collapsible Archive Table for the…):
wiki.archiveteam.org/?diff=51656&oldid=51382
-
h2ibot
Pedrosso edited Steam (+0, /* Archives */ Changed GetDetails to QueryFiles):
wiki.archiveteam.org/?diff=51657&oldid=51656
-
h2ibot
DigitalDragon edited Current Projects (+147, add Vbox7):
wiki.archiveteam.org/?diff=51658&oldid=51458
-
pi31415
I found a site that i would like to archive, and it uses OpenSeadragon to present a ginormous zoomable image. Any wisdom on archiving that?
-
pokechu22
-
pokechu22
I wrote some horrible code that handles that when I was doing
chinesepainting.seattleartmuseum.org/OSCI I think
-
pi31415
The one i am looking at has Size Height="126976" Width="204800"
-
pokechu22
transfer.archivete.am/GOq6k/make_url_list.py - be warned that it's *really* bad code and you'll probably need to modify it to work (among other things it assumes the tile size is 256, while the sample I linked is 254 for some reason, and this also handles stuff other than add_accession_number)
-
eggdrop
-
pi31415
Thanks!
-
pi31415
this site has auth credentials in the .js code that makes requests to the data
-
h2ibot
-
pi31415
hum, looks like i can just get a directory listing of the tile files from http :-)
-
pokechu22
Yeah, that's probably easier :)
-
pi31415
looks like the auth credentials are for a FileMaker backed REST API looks up a location in the image keyed on meta-data
-
pi31415
guess i am SOL regarding that meta-data
-
h2ibot
Pedrosso edited Steam (+283026, /* Archives */ Added all other steam workshops…):
wiki.archiveteam.org/?diff=51660&oldid=51657
-
JAA
wat