01:48:42 https://app.box.com/s/6b9wmjvr582c95uzma1136exumk6p989/folder/135953042066 we should probably keep an eye on this 01:51:26 flashfire42: Are those documents involved in the Apple lawsuit? 01:51:33 yes 01:51:53 came from https://www.ign.com/articles/epic-vs-apple-shows-the-courts-were-not-prepared-for-the-games-industrys-obsessive-secrecy 01:51:59 tried archivebot and it didnt work 01:59:55 Yup, Box is JS crap. I'm not aware of any good way to archive it. 01:59:58 Yeah, that's way too JS-heavy 02:00:07 Ther'es a download button that compiles it into a ZIP 02:00:31 Obviously not going to work in the WBM 02:21:59 Reuters is now on the naughty list apparently: https://twitter.com/felixsalmon/status/1389649958895333380 02:33:57 in re giantbomb discussion: the same late-2020 acquisition (viacomcbs selling cnet media group to red ventures) put gamefaqs under new ownership; it has been suggested on the subreddit that the boards may be in danger 02:34:20 (op says "The website was bought by a new company recently and they have said that most of the people who visit the site these days just wanna look at the faqs and the message boards are not visited much anymore..." but doesn't provide a source, zero points) 02:35:32 eyeballing the front page suggests several million topics, let alone pages, so probably not an archivebot job, but maybe worth looking into 02:37:06 Would be pretty devastating to lose GameFAQs imo 02:37:11 A lot of gaming history there 02:41:34 Moved from doing everything manually to docker. 02:44:24 We can start a project for each, all it takes is one person to create a url list honestly 02:45:39 thuban: 80 million topic IDs, post IDs are approaching a billion. 02:46:54 Definitely no AB matter, obviously. 02:47:16 reuters losing it 02:47:31 JAA: nice, could do a tiny project 02:48:04 So on the reuters thing , they might have just been migrating stuff 02:48:06 https://twitter.com/felixsalmon/status/1389672394353258498 02:50:37 But is giant bomb owned by the same parent company as gamefaqs? 02:52:29 mgrandi: yes, they were both part of viacomcbs 02:55:55 JAA: ids look sequential, but (a) board urls appear to require a name slug and (b) while i suspect thread ids are unique sitewide, thread urls appear to require the board id/name 02:56:57 Yep, I found the same thing. Looking for a URL that just uses the topic ID, but haven't seen anything yet. 02:57:15 that and their 'one board for every single game, browse by system' would make enumeration doable but nontrivial 03:00:46 (oh, and the non-game boards have equally bespoke categorization and listing) 03:03:32 Do they require a name slug? 03:03:37 Those are usually optional 03:04:10 the boards? looks like, yes (unless there is a secret alternate url scheme) 03:04:25 Nope, slug is optional: https://gamefaqs.gamespot.com/boards/234547-/79440096 03:04:29 But you need the ID. 03:04:57 oh! i was fooled because i didn't leave the _trailing hyphen_ 03:06:18 Cool 03:06:34 Although one might want the full slug since that's what people would want 03:06:45 Maybe doing a header only request for that url will return the full url? 03:07:39 nope 03:08:50 the thread and (more importantly) pagination links all include the slug, though, so it would be easily had 03:09:28 We're obviously not going to bruteforce 300k board IDs times 80M topic IDs. 03:09:58 no, of course not 03:11:14 But we could use it to bruteforce the boards without having to traverse their games list etc. 03:11:46 right right, and if necessary (seems likely) use them to spider topics 03:12:00 Hmm, https://gamefaqs.gamespot.com/boards/3- only shows topics from the past week. 03:14:46 aggressive pruning? some of the game boards definitely have very old posts 03:17:12 Smells like it, yeah. https://gamefaqs.gamespot.com/boards/3-poll-of-the-day/77020198 existed in late 2018: https://web.archive.org/web/20181007202729/https://gamefaqs.gamespot.com/boards/3-poll-of-the-day/77020198 03:21:05 Some boards have access restrictions: https://gamefaqs.gamespot.com/boards/306-gamefaqs-usa-atlantic 03:21:48 I'm not saying brute forcing , all of the board IDs should be discoverable 03:21:52 er, and by "very old" i mean 2008; gamefaqs appears to have had boards (of some form) as far back as 2000. 03:24:46 Wikipedia mentions that boards were shared between GameFAQs and GameSpot between 2004 and 2012. 03:25:04 I've only seen a couple posts from before 2012, so that seems related. 03:27:25 There are also topics that don't show up on the corresponding board: https://gamefaqs.gamespot.com/boards/11-sballin/62745571 03:27:36 They say they have boards per game 03:28:07 https://gamefaqs.gamespot.com/boards/533287-super-mario-sunshine?page=33 has posts from 2008 03:29:02 https://gamefaqs.gamespot.com/boards/198848-super-mario-64?page=61 maybe 2008 is as far back as they go? 03:30:23 https://gamefaqs.gamespot.com/boards/197341-final-fantasy-vii?page=1247 also 2008 03:31:04 arkiver, SketchTheCow: what was the question about RTHK? was someone searching for translation, general suggestions on what to focus on backing up or ... 03:39:00 if the question is about who might be interested in an archive of the materials — maybe independent online media, haven't got a name but there was a museum that displayed items from the Umbrella movement 03:45:48 https://www.newschoolfreepress.com/2020/09/30/we-are-all-hongkongers-an-art-exhibit-that-recorded-a-revolution/ 03:47:55 the LIHKG forum is frequented by local residents, they will probably be able to provide some names 03:54:00 local museums and libraries may have the connections but I'm guessing most won't take them anymore 04:00:49 thuban: I'm slowly wading through the list of Letter to Hong Kong, at least to grab the audio files, no need to look into that one in particular 04:06:40 back later, thanks archiveteam and anyone else helping with the RTHK things! 04:39:09 Later :) 05:05:38 I was going to write to MeriStation, but their contact form is broken (404 after submission). Welp. 05:49:46 Everything I could say is that your are doing a great job. You've done so much for archiving Internet sites. And Internet is so unexpectedly (for me) fragile... Have you heard of codepad.org that won't respond for several weeks? That is terrifying (in a way)... 06:04:02 the good news is that the 'streaming' version of _hong kong connection_ is in 720p (as opposed to the 'archive', which is 480p) 06:05:19 the bad news is that it will therefore take one zillion years to download 06:05:50 @guest00014: what about codepad? 06:08:35 thuban: haha ... whichever you think is the better approach 06:11:59 @mcgrandi codepad.org used to be an online code interpreter (you know, like jsfiddle, but for several languages), created by Steven Hazel back in 2008. People used it to save code under user accounts, etc. It seems like it .. just disappeared in March 2021, without notice... 06:17:10 arkiver: each 'Hong Kong Connection' episode comes in a 720p 'vod' version hosted as a playlist with segments, and a 480p 'archive' version hosted as a single video. i plan to download the higher-quality versions to upload to ia, but which do we want for the wbm? both? 06:20:15 (whoops, actually there are several segmented versions for each ep, at different qualities; i've just only been paying attention to the best) 06:24:16 And since it had no CORS, it used to be a cosy place to store js-code and then embed its raw-code to use as part of bookmarklets etc (e.g. pastebin is paid-only for disable CORS, hastebin uses CORS etc). A bit terrifying (and maybe unexpected) is that there is no info what has really happened. But it seems like it can happen to any site... 09:09:55 Was off for the night. Only linked that apple vs epic box.com link since i got a shitty upload at my end and no quick way to bounce the pdfs over my server for faster upload 10:11:57 masterX244: "Only linked"? I don't understand what it was you did 11:08:08 referring to a link i posted earlier in this chat 11:08:28 was regarding to a comment on that it doesnt work with the WBM 11:09:45 I got really slow upload ==>600MB takes a while for upload. Linked that folder here yesterday so others are aware of it 12:33:21 I am getting this error in warrior Retrying after 60 seconds... 12:33:21 exit code 5 for Item y0iGzQ6W 12:33:33 max connections (-1) reached??? why -1 12:35:18 For what project? 12:37:22 Pastebin 12:38:56 also this 12:38:57 Retrying after 60 seconds... 12:38:58 RsyncUpload for Item gGR6rR0w 12:39:21 NVM thats the same thing 12:40:24 EggplantN: your box? 12:40:37 I switched to reddit now but it wont switch because its stuck on this error 12:40:47 pastebin? 12:40:47 ye 12:52:41 reddit works fine for me 13:58:30 Does anyone know how 130k items got added to the Yahoo Answers archive between today and yesterday? 13:59:39 kid urls were still responding for a few hours, so they tried scraping them 14:00:08 A question for #noanswers 14:01:45 I was inexplicably banned from there despite personally creating over 3% of the Yahoo Answers archive, recruiting many people to the project, and taking time to answer questions in the channel =\ 14:06:12 I think maybe you were annoyed that when someone asked how much we had archived I joked that it was 69.420%? 14:08:34 Please understand I'm a volunteer, I gave up my time for this, I spent my own money on spinning up a lot of VMs, and it took away from my obligations and relationships to devote my attention to this project. I just ask to be treated with basic respect and decency. 14:26:43 checking.. 15:24:12 nyany: looks like framasoft is already on deathwatch but maybe a request could be made to queue the pastebin earlier? 15:36:42 Looks like you'd have to provide the decryption key as part of the URL in order to view anything on framabin or you'll just see an error message 15:38:11 Although the encrypted ciphertext is always downloaded even if the key is missing, so that could be saved 15:38:36 Although saving a bunch of encrypted messages without key doesn't exactly sound very useful... 15:38:41 keys* 15:46:21 nyany: apparently they were set to expire after a week by default 15:49:14 nuroten / hook54321 ah i should have checked dw first 15:49:24 i just happened to stumble upon it lol 15:54:08 yeah, they've been slowly winding down their hosted services for some time, a few at a time, so maybe it's not a bad idea to ask if backups haven't already yet been scheduled 16:03:20 the closing schedule: https://alt.framasoft.org/en/ 16:10:47 "We refuse to become the « default » solution and to monopolize your uses and attention" - What a creative excuse for deleting their user's data 16:10:48 nyany: any public data on framabin? 16:15:47 they are a small non-profit that probably wanted to raise awareness about open source, let people try out the apps. the sunset is happening slowly at least, not disappearing overnight 16:17:00 they don't really have a business model to sustain the hosting, it was done through crowdfunding basically 16:24:08 the initial announcement was back in September 2019, most of the stuff is still online but read-only 16:24:29 it might be worth reaching out to them directly? 17:01:52 I don't know if that's possible because it's an encrypted pastebin 17:02:27 arkiver: from a quick glance no, but crawls contain links back to their service 17:19:30 Has anyone looked at archiving the websites / social media from the UK elections tomorrow? 17:19:33 ("UK elections" => actually parliment elections for Scotland / Wales, and council elections for England) 17:19:53 I can get a lot of websites / twitters / facebooks / etc... from democracy club 17:20:36 But I recall from a month or so ago that there were issue archiving from facebook / instagram due to rate limiting - anyone know if this is still the case (JAA?) 17:23:05 betamax: That is still the case, and in fact Facebook has become even worse recently. 17:23:26 Is twitter still OK? (with the latest version of snscrape?) 17:23:49 Yeah 17:24:15 OK, I'll focus on candidate + party websites and twitter first 18:05:46 MeriStation is gone, by the way. 18:06:26 :( 18:34:51 so, uh... liveleak is not actually gone. the front page and 'browse' are redirecting, but channel pages and individual video items are still there. probably not for long. are we going to get on this? 18:40:40 *breathes in heavily* We better get a load of LiveLeak as /much/ as possible before it completely BOMBS itself! 18:40:54 JAA, arkiver, etc ^ 18:42:12 very little xhr on video pages (video urls in html as