01:10:48 Hello all, it seems like The China Project (https://thechinaproject.com/2023/11/06/some-sad-news/) is shutting down. Can someone please send this through archivebot? 01:14:14 hitchhitchhitch: Thanks, I've started a job for it. 01:21:27 Hello. Quick question about the Imgur Archive Project. Imgur abruptly deleted my account without warning and I lost all of my uploads. Will it be possible to identify a specific Imgur user's deleted files and albums once the archived data is ready to be publicly released? 01:24:27 Edel69: hmm not sure if public albums were archived 01:26:25 We also never archived the user profile pages (though we collected a list of them I think) 01:26:49 None of my uploads were public. I guess that makes this even worse then. I just can't believe they nuked my decade old account with no warning, and I'm apparently not the only one. 01:28:40 Edel69: no, other way around, public galleries weren't archived because they were at less risk of deletion 01:28:43 You might be able to recover a list from browser history? 01:30:05 so random uploads that *weren't* on public galleries (the publicly-seen side of imgur.com where mostly memes are shared) are more likely to be archived 01:30:13 problem is I'm not sure how to get them by username 01:30:19 It might in theory be possible to find albums, but since there is no index, you'd have to go through the full 650 TiB of data to find them, so it's infeasible. 01:30:31 Individual image pages don't contain the username. 01:30:31 yeah no easy way 01:30:39 JAA: are the warcs even public? 01:31:06 i think so 01:31:23 yeah they are 01:31:57 note that galleries and albums are different 01:32:14 nicolas17 I guess this means there's no hope then. Thanks for the clarification. pokechu22 Are you referring to a list of URLs? 01:32:43 URLs or image IDs 01:33:33 Pulling up an example from my browser history... which was never saved, I guess I need to mine that for more URLs... https://imgur.com/a/hZmgsE8 exists but https://imgur.com/gallery/hZmgsE8 doesn't, but on the other hand https://imgur.com/a/MSeaL6C is the same as 01:34:20 Yeah, albums and galleries and their relations are weird. I think we discussed that in detail in #imgone early in the project. 01:34:37 yeah 01:34:58 are you EricBowman86? 01:35:15 or was that just a random one from browser history, not your upload? 01:35:19 ah, but https://i.imgur.com/6WN7pub.png was saved, probably extracted from my IRC logs, so it's *only* the album that wasn't saved 01:35:33 https://imgur.com/a/MSeaL6C is a random one from the "most viral" section 01:35:51 my username on imgur was pokechu22 though I also uploaded a lot of stuff when not signed in 01:35:57 pokechu22 Probably wouldn't work out well because I had a lot of private albums and images. I doubt they're all in the browser history. 01:49:01 i lost some channels previously on the 9th of July apparently 01:49:07 just reconnected to them 01:58:12 hi all 01:58:28 do you know of any effort towards preserving twitter spaces? (audio rooms on the website now known as X) 02:00:11 nowadays those are completely behind a login wall right? 02:00:28 i remember at some point one could listen to them without login, but last time i checked one it was behind a login 02:00:32 arkiver: some metadata yes 02:00:58 arkiver: but not actual audio chunks IIRC 02:01:10 how do we get the URLs to the audio chunks though? 02:01:34 yeah 02:01:46 same question here 02:01:57 nicolas17: the live_video_stream API should be usable without login 02:01:58 also metadata is kind of important, we don't want a giant pile of unlabeled mp3s :) 02:02:21 as they seem to be based off periscope infra, a fair amount of code could be shared with the periscope grab 02:03:56 at first https://github.com/HoloArchivists/twspace-dl and now https://github.com/HitomaruKonpaku/twspace-crawler appear to be the state of the art 02:04:21 i'll check them out, did not have a very good look at twitter spaces yet 02:05:54 some may be officially unrecorded but if you get the m3u URL while they're live you can download them in full within 30 days 02:09:42 watching for live spaces via the avatar_content API would likely not be suited for warriors as it requires login AFAIK 02:11:45 searching for spaces via other means (either recorded or not) is otherwise notoriously difficult 02:12:27 would they even be in scope for AT? your call 02:38:42 Hello there! I'm trying to recover some missing videos off youtube that were titled "lounge edit". I recently found a website called YouTube Video Finder made by a "TheTechRobo". I have the URL for some of the deleted videos, but this website mentioned that there is a "#youtubearchive" here that has the video? 02:39:11 /join #youtubearchive 02:39:28 archive.org has some youtube saved too btw, join #down-the-tube for that 02:39:54 I did check archive first, however it just only has the page saved of some of the videos without the actual video saved 02:40:16 yeah, you'll want #youtubearchive 02:40:44 the command that J.AA sent should work 02:42:29 It did! Thank you for your website by the way, I came across it in a comment thread in the DataHoarder reddit and it's helped with recovering 02:49:26 jarfeh: awesome, glad to hear it! :-) 03:11:23 https://twitter.com/YahtzeeCroshaw/status/1721687212541280425 03:11:23 nitter: https://nitter.net/YahtzeeCroshaw/status/1721687212541280425 03:11:47 Might be good to backup the Zero Punctuation videos 03:12:50 nulldata: The entire channel is already running through #down-the-tube. :-) 03:13:07 Aw sweet- thanks :) 03:13:13 Ah* 03:14:31 Oh, there's a separate channel from the general Escapist one. 03:14:52 Or maybe that's unofficial. 03:15:16 I wonder if there's any Escapist videos exclusive to the site and not on the YT channel? A reply to the post says the entire video team left 12:40:23 Sharing here just in case: https://lemmy.sdf.org/post/7179616 12:59:44 (site is https://hikarinoakari.com/) 13:07:26 site itself looks ok for archivebot except for disqus (images are lazy-loaded but have in-source srcs) 13:12:09 music is behind an onsite landing page which base64s a link to an offsite landing page (either a login-walled forum or a link shortener) which links to a third-party host (mostly google drive/mega), so no chance of abing that 14:49:32 Link shorteners could probably be extracted by some warc-digesting and then crunched out 15:00:49 in theory, yeah 15:01:58 (would have to go through another round of ab since it requires you to click through rather than being a redirect--oddly enough, the site claims to have a captcha but just works with js disabled) 15:03:00 in practice, idk how valuable it would be given that we don't have tooling for those file hosts 15:25:26 the problems at IA 1 to two months ago have now been fully fixed 15:28:20 excellent! :D 15:58:44 arkiver: cool, let's resume bruteforcing imgur 15:58:48 (let's not :D) 16:01:03 :P 16:50:28 https://www-forbes-com.cdn.ampproject.org/v/s/www.forbes.com/sites/paultassi/2023/11/07/zero-punctuation-ends-as-the-escapist-faces-mass-resignations-after-eic-firing/amp/?amp_gsa=1&_js_v=a9&usqp=mq331AQGsAEggAID#amp_tf=From%20%251%24s&aoh=16993753722447&csi=0&referrer=https%3A%2F%2Fwww.google.com&share=https%3A%2F%2Fwww.forbes.com%2Fsites%2Fpaultassi%2F2023%2F11%2F07%2Fzero-punctuation-ends-as-the-escapist-faces-mas 16:50:29 s-resignations-after-eic-firing%2F 16:50:43 Oh my gosh I got bamboozled by the url length I'm sorry 16:51:47 https://www.forbes.com/sites/paultassi/2023/11/07/zero-punctuation-ends-as-the-escapist-faces-mass-resignations-after-eic-firing/amp/ 16:54:33 Yes that 17:06:00 The download links on https://hikarinoakari.com/ are a mess. Some go to a link shortener with a captcha, some go to a Twitter account, etc. 17:06:12 The site itself is running through AB though. 17:11:43 Now that IA is ok, can we get mediaonfire unclogged? If nothing changed, it's still going into temp storage 17:12:15 vokunal|m: mediafire still going to temporary storage? 17:12:31 i see WARCs appearing on IA 17:12:34 seems like it is going to IA 17:12:44 and it doesn't seem to be clogged 17:14:05 ah cool 17:14:50 it must be flowing smooth then. I was thinking the out was clogged, but it must just be some funky items. They've been flowing nonstop, but slow 17:15:35 we also queue mediafire items discovered in #// 17:17:30 That makes sense why the claims wouldn't be going down that fast. I was confused when we had around 40k todo, and it drained into claims over a week or so, but didn't seem to be leaving claims 17:44:52 🥳 23:42:26 I'm new to hackint.org as well as to this chat. The wiki says this channel is supposed to be the right one to ask/inform about dying websites, is that accurate? 23:42:37 Pedrosso: Yes 23:49:46 Spore is a game that's been out since september 4th 2008, and support (by EA) has been declining. Sporepedia. There's no official shutdown date afaik, however I am anxious considering the company that's hosting them (EA), and how the company has already almost broken the game in itself with its own launcher. What I'm worried about saving is the 23:49:46 sporepedia (spore.com) A large and very old website with millions of users and creations (>10 million enumerated files with approximately the average filesize of 20kB) I've been using my own (bad) code to save only the creations, but it's inefficient and also leaves out all the forums, creators, comments, etc. 23:50:25 I don't know how much this community cares for archiving such stuff, as it's a niche thing. Mind enlightening me? 23:53:34 can ArchiveBot handle that spore.com ? 23:53:37 would be nice to have a copy yes 23:53:50 EA doesn't like us unfortunately :| 23:54:11 does that matter? :P 23:54:48 If they block or rate-limit us, it does. 23:54:49 Most of their websites timeout with archivebot, though that's mostly newer stuff (ea.com and like battlefront I think). Not sure if spore.com is also affected 23:54:53 I have been downloading these files for a long while and have had 0 apparent problems with rate-limiting, etc 23:55:08 or, not even timeout, instead it acts more like a tarpit if I recall correctly 23:55:10 Holy shit that site shared in the main channel is cancerous so many popups 23:55:41 (for reference the site shared there was https://hikarinoakari.com / https://imgur.com/jeJSEu6) 23:55:49 I'm getting an expired cert on spore.com. Nice. 23:56:09 the site is very much in a state of disrepair, hence why I'm concerned 23:56:17 I believe ScenarioPlanet sent me some spore-related stuff and that worked fine in the past, but it was a fairly small subset 23:56:53 http://www.spore.com/sporepedia is what I'm referring to specifically 23:57:23 The site does not work well without JS, so there's that. 23:57:50 Spore just timed out for me on the main domain there. spore.com 23:58:03 Yeah, took me a few tries as well to get there. 23:58:04 sporepedia loads fine maybe it needs teh www 23:58:30 needs the www indeed 23:58:37 We can certainly try to run it through ArchiveBot.