01:01:19 YetAnotherArchiver edited Zhihu (+1188, IP address appears in HTML): https://wiki.archiveteam.org/?diff=51166&oldid=50574 01:01:20 Dango360 edited Deathwatch (+139, updated v3rmillion; the old forums prior to…): https://wiki.archiveteam.org/?diff=51167&oldid=51160 01:53:25 im back besties 01:53:57 wb 02:31:25 gah 02:31:33 JAA: found another clash, /pub/seamonkey/nightly/README-253.txt 02:31:43 https://archive.mozilla.org/pub/seamonkey/nightly/ 02:33:50 lol, fun 02:39:37 anyway you can visualize disk usage per directory with: 02:40:31 wget https://data.nicolas17.xyz/archive-mozilla.ncdu.zst; ncdu -f <(zstdcat archive-mozilla.ncdu.zst); then press "a" to show "apparent size" instead of physical size (which is zero, since I'm making sparse files) 02:40:40 this is incomplete because my script crashed with the above readme :P 02:41:35 I worry if this is gonna eat 15GB of RAM when it's complete 04:01:41 sigh, rootsweb is starting to kill off hosted sites. https://sites.rootsweb.com/ "Hosted websites will become read-only beginning in early 2024. At that time, all logins will be disabled, but hosted sites will remain on RootsWeb as static content. Website owners wishing to maintain their sites must migrate to a different hosting provider before 04:01:42 2024" only a matter of time before they delete, this is what ancestry (parent company) has done with a ton of websites 04:02:02 so frustrating 04:04:59 at least theres an index tho i think that some arent listed 04:06:32 Added that to Deathwatch even though they say it'll stay online. Ancestry has a terrible track record and a high body count. 04:06:55 JustAnotherArchivist edited Deathwatch (+255, /* 2024 */ Add RootsWeb sites): https://wiki.archiveteam.org/?diff=51168&oldid=51167 04:08:09 Well, of course they do, since tracking bodies is basically their business. :-) 04:08:25 LOL 04:08:57 just the worst though - i know theres a ton of historically important stuff on there. i dont know if the index would be easy to archive tho 04:09:58 Well, the index doesn't work at all without JS, so that isn't a great start. Not going to poke it further right now though. 04:10:47 https://sites.rootsweb.com/~rootswebsiteindex/ this one? 04:11:27 I was just looking at the homepage, which says 'index' without anything further. That one looks a bit better, yeah. 04:12:09 yeah still kinda obnoxious tho unfortunately 04:12:40 It's a simple list with links at least, not some JS hellhole. 04:12:51 fair lol 04:13:20 ill remind everyone again after it becomes read only to scoop up the last few links, i know of a few rootsweb sites that are still going strong 13:34:48 Bzc6p edited Indafotó (+133, /* Archiving */ Status update): https://wiki.archiveteam.org/?diff=51169&oldid=50468 13:46:28 Looking through the logs it seems nobody has talked about urban dictionary here. Shouldn't there be an interest to archive such, considering its nature? 13:47:59 Is it the sheer size, or simply a disinterest? 14:05:04 I "love" the way urbandict does their sitemap https://lounge.kuhaon.fun/folder/1ca69f327c965cb3/sitemap-https.xml.gz 14:05:10 https://www.urbandictionary.com/define.php?term=sitemap.xml 14:41:25 as do I 14:43:08 However, the browse tab seems very complete. Hence as long as AB can get outlinks from it, the entire site could be gotten through it. However, it is quite huge. This was pulled last year https://github.com/mattbierner/urban-dictionary-word-list/tree/master/data 14:47:04 So I don't think finding an appropriate sitemap would be any issue 14:47:13 Even though theirs is, well, a little "funny" 14:56:29 I'm just curious, is the reason it hasn't been sent to AB that it's too big? 15:25:57 most likely just no "reason" to grab as its stable and not closing down anytime soon (knock on wood). 15:26:35 s/reason/imminent reason 15:27:38 https://wiki.archiveteam.org/index.php/Alive..._OR_ARE_THEY 15:27:38 Haha. But fair. I suppose then that an archive would only ever have a reason to be made once its in shutdown stages? 15:31:39 usually then is the time the site is brought to attention here and grabbed or a project is made. Proactive grabs are done all the time though, so I'n not saying nor do I have the power/knowlege to say it should not be grabbed now 15:33:27 also fair. The only reason I could imagine not to grab it is its size. By my estimate(s) it could be quite a few TiB and quite a lot of pages 15:44:14 Wessel1512 edited Deathwatch (+122, /* 2023 */): https://wiki.archiveteam.org/?diff=51170&oldid=51168 17:39:07 It has definitely been discussed before, but not sure when or where. 17:53:57 there's sites where maybe we can make grab scripts while not doing the actual grabbing (or only at very low speeds) 21:21:40 -+rss- Rosalynn Carter has died: https://www.nbcnews.com/news/obituaries/rosalynn-carter-former-first-lady-dies-rcna62862 https://news.ycombinator.com/item?id=38337395 22:05:36 might be a good idea to grab President Carters websites as well as he is in hospice care still and these kidns of events don't help 22:59:43 Wickedplayer494 edited Current Projects (-155, DPReview lives on): https://wiki.archiveteam.org/?diff=51171&oldid=50947 23:00:43 JAABot edited CurrentWarriorProject (+2): https://wiki.archiveteam.org/?diff=51172&oldid=51164 23:02:43 Wickedplayer494 edited Current Projects (+181, And I guess we're scooping up inactive Blogger…): https://wiki.archiveteam.org/?diff=51173&oldid=51171