00:46:49 -+rss- Nintendo Network shutdown – The beginning of the end: https://pretendo.network/blog/12-23-23 https://news.ycombinator.com/item?id=38766570 00:46:57 we've maybe seen the nintendo stuff earlier yeah? 00:52:03 So... What services does Nintendo hold with that network? 00:58:44 friend was complaining that all the player-made/shared levels for the first game would be unavailable 00:58:59 unsure if there's a way to do anything with that sadly 01:03:32 Was there ever done a grab of Super Mario Maker games? What other sorts of things could be of interest of archival? 01:13:08 courses* 03:38:23 Assuming OneHallyu stays up, the topic retries should be done in a bit over 2 hours. I'll then run another similar thing for the remaining topics that are being done sequentially since that's so slow. Also one more topic failed with timeouts. 03:38:37 Some of these topics have well over 10k pages, pretty insane. 03:40:08 another forum bites the dust, forums are disappearing rapidly :( 03:45:46 everyone loves fucking discord these days 03:46:32 sadly the case, forums are move to the free easy to use walled garden known as discord :/ 03:47:38 wrote a discord archiver recently to archive discord servers into a database, hopefully their API doesn't change too much in the coming months 04:39:07 i was using DiscordChatExporter but sadly it doesn't quite support those 'new fangled' 'forum' channels (and threads in normal channels) yet 05:21:22 a discord archiver you say? 05:23:20 OwO 05:32:19 Yeah, DiscordChatExporter didn't really suit my needs (large scale distributed crawling with database store) 05:32:29 Decided to write up a basic crawler 05:32:53 Currently doesn't grab attachments, but planned as a feature soon 05:33:05 :) 05:35:14 sounds like you do some fun stuff 05:35:18 :p 05:39:25 More like preparing for Discord's inevitable demise :P 05:40:37 true true xP 05:44:14 How is the crawler you wrote different than DiscordChatExporter? 05:46:03 at the very least i would assume discord hates it more 05:46:05 :p 05:47:43 Python based (no .NET thankfully), dumps everything to database as fast as possible, distributed (can use multiple instances to allocate servers/channels to crawl with different accounts/IPs) 05:48:02 No attachment downloading right now (can backfill later) 05:48:13 (no .NET thank you so much) 05:48:23 what's so bad about .NET ? 05:48:28 it's not python ™ 05:48:32 True 05:48:40 I saw .NET, I gagged so hard I ended up writing my own crawler 05:48:44 :D 05:48:55 any issues forseen with discord requiring 'the parameters' soon? 05:49:04 for earlier grabs/crawls that might not have them 05:49:29 It's simple enough to rewrite in go or rust, but I don't really care as it's not performance intensive (all IO bound) 05:49:30 (and i guess they'll expire at some point on the attachment urls?) 05:50:10 if we really wanted developers here, we'd just need to make a few posts around the internet saying 'rust would never be able to be up to the task of ArchiveTeam's needs' 05:50:17 I believe you can regenerate the links with refresh tokens 05:50:20 and the RETF would descend hell on here 05:50:31 (rust evangelism task force) to prove us wrong 05:51:00 o.o 05:52:09 DiscordChatExporter is great for personal exporting for the average user, just didn't suit my needs. It's not a bad app for the casual archiver 05:52:56 I feel like this fits in #discard lol 05:53:30 I know Sanqui and TheTechRobo was working on Discard for MITM based crawling. I think that stalled 05:54:04 what does DiscordChatExporter do badly, other than not being able to handle the new features? 05:56:16 Mostly scalability 05:57:20 Not as easy to use 50 Discord accounts. Multiple accounts are needed due to server cap for each account (unless you leave/rejoin to swap in and out servers) 06:00:23 I see I see. I think I'd leave this to the non-casual archivers. Hah. How've you been using it so far? 06:01:46 i think if I read it correctly as well Terbium's can do 'follow-up's very easily 06:01:54 i.e. get new messages since last visit 06:02:23 (or maybe has it built in already to continuously do so) 06:02:29 Just started, so slowly scaling up (trying to find large lists of discord servers),and throw 10 accounts and IPs at them 06:03:14 :) 06:04:23 Does it go recursively too? As in if it finds an invite link does it try it and then go from there? 06:04:43 hm, but I suppose that doesn't work for servers which you have to manually figure out stuff like roles for viewing channels.. 06:05:38 Nope, it's very dumb right now, just crawls the servers the account has access to 06:06:01 Yeah, the roles stuff causes lots of problems for me 06:06:17 Especially "Verify your phone number" and all that nonsense 06:06:41 ah yeah the phone number thing :/ 06:07:55 Would be nice if we had direct access to their SycllaDB clusters 06:09:58 I suppose a dumb bot will get lots of content still. 06:10:09 Especially since we have very long lists of servers 06:38:42 Terbium: Please do keep us up to date with this 06:41:07 *disappears from AT for another 12 months* 07:14:33 Ok, I should have all OneHallyu topic pages now, I think. 07:15:56 :D 07:19:45 are attachments to be attempted? 07:20:09 Do you have an example? I couldn't find any. 07:20:18 oh, i don't 07:20:25 oh! i meant media i guess 07:20:35 i think you said you skipped .. something 07:20:50 The only things I saw hosted on OneHallyu itself were avatars. But maybe I just didn't look in the right place. 07:21:03 ah ok :) 07:21:09 faulty memories! 07:21:21 I did this with qwarc. qwarc doesn't care about HTML. So no page requisite extraction or similar. 07:22:05 qwarc fetches a URL you give it and writes it to WARC. Basically everything else is left as an exercise to the user. 07:26:22 :) 07:26:33 Oh, two topics failed. One is a 'count to a million' forum game, the other just a random small discussion. 07:31:08 The former doesn't even have 5k pages, but it's extremely slow. 07:31:27 some inefficient pagination i guess 07:31:52 No, there are far larger topics that are faster. 07:32:00 Largest I saw had 18k pages. 07:32:07 damn 07:32:27 (I didn't systematically check though, so maybe that isn't even the largest one that exists.) 07:33:00 Anyway, it's getting grabbed now, whether the server likes it or not. :-) 07:33:10 👀 07:36:46 Ah, now the response time is actually decent. 07:36:54 https://transfer.archivete.am/inline/spNkP/explanation.png 07:38:13 :D 08:15:37 That topic is done as well now, and that should be everything that's accessible. (I saw a small number of 403s.) 08:16:00 src extraction is running but will take a little while. 08:18:36 outlinks going to #// ? 08:19:24 Possibly later. Just focusing on onsite stuff now since that'll vanish very soon. 08:19:28 got it 08:19:30 sounds good 08:33:47 Merry Christmas, maniacs 08:34:56 thanks sketchy 10:33:14 https://publicwww.com/ — a search engine for stuff in websites' source code. 10:36:46 ooh neat 10:40:19 paid beyond alexa rank 1mil, higher costs for further down the ranking 10:40:31 https://publicwww.com/prices.html 10:40:40 ouch, $499/month for all URLs 10:41:11 * pabs wonders how that compares to shodan 10:41:33 er 3mil not 1mil 10:42:09 and $49/month gets you all URLs, but only 100 searches/day up to 100K rows 10:42:31 hmm, I think the 1mil was without an account 11:26:07 JAA: for OneHallyu, did you save the user profiles? (You can do https://onehallyu.com/profile/1--/ https://onehallyu.com/profile/2--/ https://onehallyu.com/profile/3--/ etc and get redirected to the correct name. Looks like there's also different tabs on each profile page that need to be requested separately.) 12:35:05 qwertyasdfuiopghjkl: No, only topics. 14:51:58 I started an AB job for the OneHallyu src values I managed to extract, but it looks like the site is dying now and returning HTTP 522 (Buttflare's code for connection timeout to the upstream server) for a lot of things. 14:52:31 So maybe they took the server online and only what remains in Buttflare's cache is still around. 14:52:35 offline* 15:07:56 Could someone grab the upcoming Finnish presidential election candidates websites. https://lounge.kuhaon.fun/folder/65908e5765e73d9f/FinnishPresidentialElectionCandidates.txt 15:10:55 More info https://en.wikipedia.org/wiki/2024_Finnish_presidential_election 15:54:44 that_lurker: #vooterbooter 15:55:05 I'll run them later if nobody beats me to it. 15:55:22 ooh did not know there was a channel for this 16:13:51 also thanks :-)