-
h2ibotYetAnotherArchiver edited Zhihu (+1188, IP address appears in HTML): wiki.archiveteam.org/?diff=51166&oldid=50574
-
h2ibotDango360 edited Deathwatch (+139, updated v3rmillion; the old forums prior to…): wiki.archiveteam.org/?diff=51167&oldid=51160
-
mossssssim back besties
-
Pedrossowb
-
nicolas17gah
-
nicolas17JAA: found another clash, /pub/seamonkey/nightly/README-253.txt
-
nicolas17
-
JAAlol, fun
-
nicolas17anyway you can visualize disk usage per directory with:
-
nicolas17wget data.nicolas17.xyz/archive-mozilla.ncdu.zst; ncdu -f <(zstdcat archive-mozilla.ncdu.zst); then press "a" to show "apparent size" instead of physical size (which is zero, since I'm making sparse files)
-
nicolas17this is incomplete because my script crashed with the above readme :P
-
nicolas17I worry if this is gonna eat 15GB of RAM when it's complete
-
mosssssssigh, rootsweb is starting to kill off hosted sites. sites.rootsweb.com "Hosted websites will become read-only beginning in early 2024. At that time, all logins will be disabled, but hosted sites will remain on RootsWeb as static content. Website owners wishing to maintain their sites must migrate to a different hosting provider before
-
mossssss2024" only a matter of time before they delete, this is what ancestry (parent company) has done with a ton of websites
-
mossssssso frustrating
-
mossssssat least theres an index tho i think that some arent listed
-
JAAAdded that to Deathwatch even though they say it'll stay online. Ancestry has a terrible track record and a high body count.
-
h2ibotJustAnotherArchivist edited Deathwatch (+255, /* 2024 */ Add RootsWeb sites): wiki.archiveteam.org/?diff=51168&oldid=51167
-
JAAWell, of course they do, since tracking bodies is basically their business. :-)
-
mossssssLOL
-
mossssssjust the worst though - i know theres a ton of historically important stuff on there. i dont know if the index would be easy to archive tho
-
JAAWell, the index doesn't work at all without JS, so that isn't a great start. Not going to poke it further right now though.
-
mosssssssites.rootsweb.com/~rootswebsiteindex this one?
-
JAAI was just looking at the homepage, which says 'index' without anything further. That one looks a bit better, yeah.
-
mossssssyeah still kinda obnoxious tho unfortunately
-
JAAIt's a simple list with links at least, not some JS hellhole.
-
mossssssfair lol
-
mossssssill remind everyone again after it becomes read only to scoop up the last few links, i know of a few rootsweb sites that are still going strong
-
h2ibotBzc6p edited Indafotó (+133, /* Archiving */ Status update): wiki.archiveteam.org/?diff=51169&oldid=50468
-
PedrossoLooking through the logs it seems nobody has talked about urban dictionary here. Shouldn't there be an interest to archive such, considering its nature?
-
PedrossoIs it the sheer size, or simply a disinterest?
-
that_lurkerI "love" the way urbandict does their sitemap lounge.kuhaon.fun/folder/1ca69f327c965cb3/sitemap-https.xml.gz
-
that_lurker
-
Pedrossoas do I
-
PedrossoHowever, the browse tab seems very complete. Hence as long as AB can get outlinks from it, the entire site could be gotten through it. However, it is quite huge. This was pulled last year github.com/mattbierner/urban-dictionary-word-list/tree/master/data
-
PedrossoSo I don't think finding an appropriate sitemap would be any issue
-
PedrossoEven though theirs is, well, a little "funny"
-
PedrossoI'm just curious, is the reason it hasn't been sent to AB that it's too big?
-
that_lurkermost likely just no "reason" to grab as its stable and not closing down anytime soon (knock on wood).
-
that_lurkers/reason/imminent reason
-
Pedrosso
-
PedrossoHaha. But fair. I suppose then that an archive would only ever have a reason to be made once its in shutdown stages?
-
that_lurkerusually then is the time the site is brought to attention here and grabbed or a project is made. Proactive grabs are done all the time though, so I'n not saying nor do I have the power/knowlege to say it should not be grabbed now
-
Pedrossoalso fair. The only reason I could imagine not to grab it is its size. By my estimate(s) it could be quite a few TiB and quite a lot of pages
-
h2ibotWessel1512 edited Deathwatch (+122, /* 2023 */): wiki.archiveteam.org/?diff=51170&oldid=51168
-
JAAIt has definitely been discussed before, but not sure when or where.
-
nicolas17there's sites where maybe we can make grab scripts while not doing the actual grabbing (or only at very low speeds)
-
fireonlive-+rss- Rosalynn Carter has died: nbcnews.com/news/obituaries/rosalyn…er-former-first-lady-dies-rcna62862 news.ycombinator.com/item?id=38337395
-
that_lurkermight be a good idea to grab President Carters websites as well as he is in hospice care still and these kidns of events don't help
-
h2ibotWickedplayer494 edited Current Projects (-155, DPReview lives on): wiki.archiveteam.org/?diff=51171&oldid=50947
-
h2ibotJAABot edited CurrentWarriorProject (+2): wiki.archiveteam.org/?diff=51172&oldid=51164
-
h2ibotWickedplayer494 edited Current Projects (+181, And I guess we're scooping up inactive Blogger…): wiki.archiveteam.org/?diff=51173&oldid=51171