00:47:11 https://apnews.com/article/saudi-arabia-death-sentence-twitter-a2b5549806605d1d21f332ac4c36e43f 00:47:24 death sentence for twitter/youtube stuff 00:57:00 youtube project is weird, my VPS downloaded 33GB and uploaded 6GB 00:57:19 is it downloading videos and throwing them away based on some criteria instead of uploading? 00:58:35 I was under the impression it only scraped metadata, if itd grabbing the youtube webpage for example it can theow away all of the html and just upload a json zst 00:58:50 It's* throw* 00:59:10 → #down-the-tube 01:06:16 pabs: Hmm, can't find him, but it might all be in Arabic of course. 01:35:26 audrooku|m: it's certainly making 100-1000MB downloads 02:51:44 https://www.gamingonlinux.com/2023/08/dev-of-shadow-tactics-desperados-iii-shadow-gambit-the-cursed-crew-shutting-down/ 03:12:14 05:29:51 PM -+rss:#hackernews- Keisan Casio is shutting down: https://keisan.casio.com/keisan/abolition.php https://news.ycombinator.com/item?id=37328669 03:15:35 https://news.ycombinator.com/item?id=37328669#37330950 03:15:45 "Looks like the Japanese forum is going offline too: https://keisan.casio.jp/keisan/user_forum/" 04:54:29 Larsenv: looks like the forum in question is https://discussions.apple.com/browse - that's probably worth saving, but we might want to wait a few days for the Gabon stuff to settle down (and the other urgent stuff too). It doesn't sound like they'll be immediately deleting the forum on October 1, just stopping employee posting on it 04:55:03 I don't recognize the forum software in use so it might be a bit of a mess for archivebot 04:55:56 probably custom? 04:58:47 What is Gabon 04:58:56 Do we actually know how big it is? 04:59:02 https://en.wikipedia.org/wiki/2023_Gabonese_coup_d%27%C3%A9tat 05:01:19 It's probably 20 years of history! 05:02:39 oh? what is? 05:04:15 https://discussions.apple.com/browse 05:07:10 damn it 05:07:40 but yeah, it's been around forever it feels 05:08:15 hmmmm 05:08:18 https://discussions.apple.com/browse?page=1&sortBy=latestActivityOldest 05:08:22 not encouraging... 05:08:34 latest activity: oldest is something from 5 months ago 05:09:00 did they swap software/turn on auto-prune at some point? 05:10:04 The only provided sitemap in https://discussions.apple.com/sitemap-index.xml is https://discussions.apple.com/sitemap-fca81a5d-1.xml for recent posts 05:10:49 and https://discussions.apple.com/browse?page=103&sortBy=dateCreatedNewest is the last page given that way... 05:11:04 same with https://discussions.apple.com/browse?page=103&sortBy=latestActivityNewest 05:11:54 oh ok, there's this from https://discussions.apple.com/thread/8261972 2018 05:11:58 so it exists somewhere 05:12:27 https://discussions.apple.com/thread/1000000 is from 2007 05:13:07 if I sort by "recently created", it only lets me get to page 103 05:13:32 which has posts from *one day ago* so the forum seems to get 2000 threads per day? 05:13:33 I’m sad that the 10000 or whatever post is not from someone ID hunting 05:14:56 anyway if threads have sequential IDs, why bother with /browse 05:15:06 yeah seems to be 05:15:07 https://discussions.apple.com/thread/200002 05:15:12 there's some threads that ask me for login 05:15:15 and some just 404 05:15:15 but 05:15:28 randomly playing with numbers 05:16:26 so many unanswered questions :p 05:16:29 Looking at the recent ones they don't seem to be fully sequential, in fact https://discussions.apple.com/thread/255097976 is listed before https://discussions.apple.com/thread/255097973 - but there might be some kind of spam queue that needs manual approval to escape or something? 05:16:53 there's also user profiles, https://discussions.apple.com/profile/OmegaOSX/participation 05:18:04 also 200001's reply links have 'answerid' params.. 987106022 for the first reply 05:18:28 but can be ignored i think 06:11:33 FireonLive edited Xuite (+4, not so "Xuite" news, Xuite is offline): https://wiki.archiveteam.org/?diff=50630&oldid=50629 06:12:33 Yts98 edited Xuite (+77, Xuite goes offline): https://wiki.archiveteam.org/?diff=50631&oldid=50630 06:13:33 FireonLive edited Current Projects (+30, move Xuite to done): https://wiki.archiveteam.org/?diff=50632&oldid=50617 06:14:56 auto project is Xuite still, dunno if there's a better choice 06:18:51 are any other projects currently active and feeding items? telegram maybe? 06:19:34 FireonLive edited Current Projects (+18, linkify YouTube's 'selected videos'): https://wiki.archiveteam.org/?diff=50633&oldid=50632 06:20:13 telegram is empty again 06:20:24 reddit once arkiver gets time to look at the image stuff I guess 06:21:36 imgur is going to have a temporary startup as well 06:22:00 but yeah reddit will need quite the backlog crunch 06:22:03 /tableflip I guess I will find something to feed into something then 06:22:24 do you want more youtube videos? Mediafire chews it up way too quickly and I have exhausted my telegram stuff 06:23:34 if they fit within scope i don't see the harm, but i assume we'll chew though them quite quickly 06:25:57 there's still some youtube reclaims 06:26:23 and I have a ton of free bandwidth left on my VPS so I'm getting my money's worth :P 06:33:57 :D 06:54:22 so youtube is the only one running? plus the dregs of gfycat? 06:55:04 telegram is also running, but empty currently 06:55:12 but thats about it yeah 08:00:56 JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=50634&oldid=50582 10:06:48 http://littlewitchacademia.jp/tv1st/ Requesting to queue this site for archival, if possible. Not necessarily urgent, but bandai namco deleted their page pertaining the anime a while back (https://web.archive.org/web/20180119174059/https://en.bandainamcoent.eu/little-witch-academia/little-witch-academia) so I hope the site as-is can be archived to 10:06:48 preserves its content. 10:49:03 JAA do you have the way to turn off the rearchive everything thing on telegram? 10:49:32 flashfire42: No, also wrong channel. 10:49:59 you and I both know things get buried in #telegrab I wanted to be sure it was seen 10:50:22 I read my pings. :-) 10:50:50 (And I expect others to do so, too, regardless of other noise.) 10:53:10 nstrom|m: I don't see an announcement on dedipath's site about the shutdown 10:58:23 Apparently just an email, and people on LET are reporting sender verification failures: https://lowendtalk.com/discussion/188358/dedipath-closure-of-business 11:01:40 was just about to post that over here. yeah I got the email from them because I have a server as well, can verify the contents of that post 11:11:44 if we need more work for warriors poke me when I am not working or sleeping I will toss some more into whatever you want 11:31:50 https://twitter.com/ActiveTK5929/status/1696840098862809583 11:31:50 nitter: https://nitter.net/ActiveTK5929/status/1696840098862809583 11:34:34 Dedipath confirmed via email to LES moderator that they are shutting down entirely today: https://lowendspirit.com/discussion/comment/148585/#Comment_148585 12:00:12 VirMach is using DediPath for a lot of their locations. I wonder what other providers either have their stuff colo'd there or are just reselling. 12:04:50 I *think* dedipath didn't own the colocation facilities, just had ASN (as35913) & hardware 12:04:53 I could be wrong on that though 12:06:15 I know there were definitely resellers of dedipath VPSes. .ethernetservers.com was reselling dedipath in NJ but recently switched providers there after a datacenter fire 12:08:28 they had space in 10 datacenters and their ASN announces a pretty big chunk of addresses so I'm sure there will be lots of affected customers in any case 12:09:02 (they = dedipath in above) 12:10:55 nstrom|m: They mention colo stuff in the email though...? 12:11:22 'In regards to our colocation customers if you are in the following locations please send a ticket to ...' 12:11:56 yeah I think the companies they tell colo customers to contact are the companies that actually own the datacenters 12:12:10 so it's probably something like "move your stuff off of the dedipath racks onto some other rack in the same facility if you want to stay" 12:12:31 if I had to guess 12:12:36 I'm not a colo customer there 12:13:29 Ah 12:13:56 DC's usually only like to sell in units of 1 or more racks 12:13:58 As in, entire racks 12:14:12 So there's an ecosystem of companies that basically sublease racks 12:14:24 They get a rack in a DC (or even an entire cage) 12:14:31 And then rent out individual RU with power and network 12:14:54 So if dedipath had colo customers, this is likely what they did 12:15:11 The DC would just drop power and maybe one or two fiber feeds into the rack 12:15:33 And then the company puts in a top of rack, a PDU and brings more ips 12:44:56 Oh yeah, as someone pointed out, LET still has a DediPath ad at the bottom. Beautiful. 14:14:23 "Summer Sale - 20% off select dedicated servers from just $36/m + save 20% on all VPS and web hosting! Click Here to Save Now!" 14:14:36 save because they will never deliver any service or bill you? 14:23:08 ""Summer Sale - 20% off select..." <- Is that a scam email or a genuine sale? 14:25:22 that's the banner at the top of dedipath.com 14:25:36 which is apparently going out of business today 14:25:44 according to the discussion above 15:04:08 44 TB of items with WARCs in the archiveteam collections have mediatype 'data' - this will make them unavailable in the Wayback Machine 15:04:13 i'm moving these to mediatype=web 15:04:46 after this i'll do a check to see if all WARCs have actually been derived - i already see quite some that have not been derived 15:43:56 now rederiving some 113 TB of items with WARCs that may not be correctly indexed yet 15:50:12 (that is 30k items) 15:50:37 Yay, more work for IA's overloaded systems. :-) 15:51:49 https://lounge.thetechrobo.ca/uploads/1900690ac0ae6650/image.png 15:51:57 or... more records in the Wayback Machine without having to upload new data :P 15:52:06 more completeness yay 15:52:30 But great to fix those. I hope they weren't intentionally marked with a different mediatype. Probably not many of those in the AT collection though. 15:53:11 most of these were 2014 or 2015 items from 'before AB' 15:53:16 usually uploaded by a single user 15:53:36 Ah, the dark ages. 15:53:49 from https://archive.org/details/archiveteam_earlywarcs 15:53:52 😰 15:54:02 they were initially uploaded as mediatype=data 15:54:26 (there were also some cases in which other items seemed to have been accidentally uploaded as mediatype=data) 15:54:38 but those are indeed pretty old, ~9 years or so 15:55:40 at this point in time, WARCs that should not go into the Wayback Machine are in https://archive.org/details/warczone 15:58:05 meanwhile we also still have https://archive.org/details/archiveteam-mobileme-hero , which contains 282 TB of WARCs... inside tar files :/ so also not in the Wayback Machine 15:58:30 basically that entire project is invisible to most users 15:58:53 hey my warcs are in the zone t_t 15:59:04 and i thought i was special! :P 15:59:21 but hmm interesting 15:59:56 “This massive collection represents one of the largest projects Archive Team may ever do: Over 272 terabytes…” 15:59:58 :) 16:00:11 lmao 16:00:30 urls project: 5.03PiB 16:02:15 I've uploaded WARCs in TARs before, precisely so they don't accidentally get derived and put in the WBM. 16:03:41 but i think we want the mobileme collection to be indexed? 16:04:35 Assuming there were no auth shenanigans involved, probably. 16:13:18 oh arkiver I see you found the rest of XS4ALL :P 16:20:29 is there a way to unpack those TAR files inside the IA, or do we need something to pull them down and reprocess them 16:30:14 HCross: might have to pull them down :/ 16:30:47 there's more than just WARCs. it might be best to pull them down, just pull out the WARCs and put those WARCs back in together with tars containing leftovers 16:41:48 xs4all now that’s a name i haven’t heard in ages 16:45:19 Me and HCross made an effort to archive their ISP hosting 16:45:28 Was a bit of a pain 16:45:31 And completeness is ??? 16:45:34 But it's something 16:48:40 it's a very good "something" :) 16:48:45 ? 16:48:52 It was me having a go at heritrix 16:49:06 how was the experience? 16:49:11 Good and bad? 16:49:22 It needed some code tweaks to do what I wanted it to do 16:49:33 And I ended up doing some cursed sharding 16:49:39 To make it go faster 16:49:57 Took a while to get my head around 16:50:02 So really not any worse or better than most tools 16:50:21 I ended up modifying some of the code around frontier management 16:50:54 Because I needed it to do some cursed things to deal with xs4all 16:52:06 One thing that was a pain was that at some point xs4all converted from xs4all.net/~user (or something similar) to user.xs4all.net (or something) (I forget the exact subdomains involved, but they went from ~user to subdomain) 16:52:10 And while the redirect existed 16:52:19 I had to hack a few checks out of the frontier code 16:52:34 Because I didn't want to grab many outlinks 16:52:55 But I needed it to go through to the ~user links on a different subdomain 16:52:59 And follow the redirects 16:54:26 It ended up working reasonably well 16:54:37 Also iframes did some cursed stuff I dont' remember 16:54:50 Are those patches/hacks available somewhere? 16:54:59 No, I don't even know if I still have them 16:55:04 Oof 16:55:10 I mean, it's not that hard 16:55:14 The source code is on github 16:55:20 Find the frontier code and just fuck around and find out 16:55:39 I will point out I never touched the actual retrieval and warc writing 16:55:51 I just fucked around with link discovery and frontier 16:56:08 Yeah, I just like this ideal world where the code for the archival is also freely available for anyone to figure out why the data is the way it is. 16:56:17 Yeah I get that 16:56:31 But genuinely, it was a copy paste hack job in a bunch of places 16:56:34 ahh :) 16:57:16 also, I can't actually share the seed url list 16:57:32 ah yes a .nl isp :3. probably saw it on irc way back when 20:00:16 0KepOnline edited VHS on YouTube (+253, Added Ukrainian VHS on YT): https://wiki.archiveteam.org/?diff=50635&oldid=37007 20:00:17 0KepOnline edited Local TV News (+2215, /* Europe */ Ukraine): https://wiki.archiveteam.org/?diff=50636&oldid=48641 20:02:21 Rexma edited List of websites excluded from the Wayback Machine/Partial exclusions/Twitter accounts (+38, add account): https://wiki.archiveteam.org/?diff=50637&oldid=49984 20:02:22 Rob Kam edited WikiTeam (+142, The MediaWiki comparison of wiki farms is more…): https://wiki.archiveteam.org/?diff=50638&oldid=50537 20:04:14 https://x.com/toadsanime/status/1697286315094511759?s=12 20:04:16 nitter: https://nitter.net/toadsanime/status/1697286315094511759 20:05:20 "Volition staff on Twitter are reporting that parent company Embracer has just closed the 30-year-old studio behind Saint's Row and Red Faction, with mass layoffs #VolitionJobs" 20:41:03 ran telegram at high concurrency... pop went the modem 20:52:22 :P 20:52:40 well at time you're getting the maximum out of your modem :) 21:11:33 fireonlive - that's so fucking sad 21:14:04 yeah :( 21:14:16 Even sadder that the Saints Row 2 PC patch IdolNinja was so passionate about and organized before he died of cancer very likely won't see the light of day either 21:16:49 goddamn 22:22:46 Vokunal edited ZOWA (+52, added source): https://wiki.archiveteam.org/?diff=50639&oldid=50612 23:42:40 It's almost as if one big company spending billions in debt buying up all the other publishers and developers isn't such a great thing for the stability of everyone involved 🤔 23:43:23 Just gave telegram a bit more work. Will queue more soon