06:44:46 https://twitter.com/AdamRackis/status/1671923984835788803?t=7YyAlkFaB0iS7N05CQrNCQ&s=19 06:45:19 Not sure if anyone's seen this, think there's a case to be made that css-tricks is at risk 07:50:51 Sure looks like it. 08:30:36 Anyone already looked at Wysp (August 1)? Thinking of taking something far away, so I stop with these last-minute projects 13:37:20 OrIdow6: no yet here! go ahead :) 13:37:25 more than a month left! 14:30:14 Has anyone archived Game Atsumaru? If not, I suggest archiving with ArchiveBot first, and I'll inspect how to find game assets in a few days. 15:25:50 yts98: it's actually not that difficult. the actual 'game' seems to be some HTML with references to the assets 15:27:23 i see some games though that heavily use js to generate asset URLs 15:31:49 there's in-game API https://atsumaru.github.io/api-references/ related to comments and score boards, 15:33:18 and the version number 1147 of https://resource.game.nicovideo.jp/games/gm9482/1147/index.html should be parsed from #stateSerialized of https://game.nicovideo.jp/atsumaru/games/gm9482 15:35:42 parsing 29754 games and search for the string "RPGAtsumaru" would be easy without the warrior 15:40:28 AK: fwiw, anything DigitalOcean touches content-wise is always at risk. they have a long history of shoddy content marketing practices, pretty much from the first moment they started doing their 'guides' thing 15:41:08 they talk a big game, but it's been obvious from the poor quality of their content (and the refusal to issue corrections) that they don't actually care about it in any way other than to draw in more customers by looking hip and helpful 15:54:56 i think it’s one of those like seo boosting things, get 30 articles out about roughly the same topic changing things slightly so they appear more in search results and are more in people’s minds / call to action to register for digitalocean. iirc they allow anyone to write for them and pay out a few bucks per article 15:56:30 oh lordy imagine a site stuffed with chatgpt instructionals 17:03:57 I believe we already archived css-tricks.com and some other things a few months ago when DO was doing layoffs. 17:35:07 re the announcement in archiveteam: clearly too large for AB 17:36:36 ...is this frequency of shutdowns normal? 17:38:57 They have a URL shortener at sk.mu, but the codes are far too long to be bruteforcable. E.g. http://sk.mu/a90soh3wEbti for one of the most recent posts. 17:39:53 here's deathwatch, some years look busier than others: https://wiki.archiveteam.org/index.php/Deathwatch 17:40:01 but i imagine there's stuff that didn't make it cc nicolas17 17:40:53 Yeah, stuff lands on Deathwatch if someone adds it, and I think some of the smaller things used to not be added as often. 17:45:54 Ok, blogs can be enumerated via e.g. https://www.skyrock.com/common/r/skynautes/card/75598933 17:46:29 Blog IDs go to around 124 million, so that's fun enough. 17:47:57 Skyblog (https://www.skyrock.com/blog/) announced that they will shut down the 21st August. 17:47:57 It was a pre-facebook social media, especially popular in France. 17:47:57 Has it been archived? Would it be archivable? 17:47:57 Can it be added to the Deathwatch page please? 17:48:13 See above, we're already discussing it. 17:48:22 And please do add it to Deathwatch. It's a wiki. :-) 17:49:57 Server seems very stable, I'm already getting timeouts after just clicking around a bit. 17:51:14 It has been said that anonymized data will be saved to the INA and BNF (French authorities that archive mainly the TV and radio broadcast and books respectively) 17:54:23 They offer ways to save blogs, but using third-party tools (e.g. Cyotek WebCopy, A1 Website Download, HTTrack are methods on the official page) 17:57:03 Exorcism|m: re https://www.wysp.ws/, it seems like archivebot won't work well with it due to javascript. The "new" tab on the front page uses https://www.wysp.ws/timeline/load/?tlid=wysp-main&start=-1&rg=32&nb_col=3&order=antichronological&term_string=newest (and that progresses onwards). I also tried 17:57:05 https://www.wysp.ws/timeline/load/?tlid=wysp-main&start=-1&rg=32&nb_col=3&order=chronological&term_string=oldest and that seems like the oldest post it gives is https://www.wysp.ws/post/866261001/ which isn't the oldest (the "hall of fame" tab gives https://www.wysp.ws/post/8492023/ from 2013, while that post is 2017). IDs don't seem to be incremental so I'm not sure how to 17:57:07 go about saving everything. 18:00:54 do we have a source on the august 21 date for skyrock? i don't see it in the linked announcement 18:02:28 it's given in the news article, but no mention of where they got that information 18:04:42 LeGoupil, albertlarsan68: ^ any info? 18:06:26 thuban: on https://www.skyrock.com/blog/ it's in small on the banner next to ICI T LIBRE 18:07:14 LeGoupil: so it is, thank you 18:09:01 Switchnode edited Deathwatch (+235, /* 2023 */ add skyrock): https://wiki.archiveteam.org/?diff=49997&oldid=49993 18:11:01 Switchnode edited Deathwatch (+8, /* 2023 */): https://wiki.archiveteam.org/?diff=49998&oldid=49997 18:51:36 Official english statement: https://the-skyrock-team.skyrock.com/ 18:52:29 Shared link: http://sk.mu/a3XJM6hYvHNo 18:53:05 Unshortened link: https://the-skyrock-team.skyrock.com/3356796874-posted-on-2023-06-22.html 18:54:09 Switchnode edited Deathwatch (+92, /* 2023 */ add english-language skyrock…): https://wiki.archiveteam.org/?diff=49999&oldid=49998 19:03:26 What should the IRC stream be named? 19:04:08 I can propose #downblog, #thunderblog 19:08:40 I quite like the second one, being a reference to French history (kinda): it is known (in France) that the Gaulois (IDK the English for that) were supposedly afraid of the sky falling on their head, this being the thunder. 19:08:56 with the new emoji policy arkiver +1'd i recommend #⛈️📝 19:08:57 :P 19:10:55 s/the new/my proposed/ 19:24:17 It seems like each blog (subdomain in ".skyrock.com") has a sitemap.xml, so maybe would be a warrior project? 19:24:32 ooh that's useful 19:25:30 depending on average size of a blog it might blow up on VM warriors 19:27:44 hmmm. sitemap links could be reported back to the trackers I guess? or pre-scraped? 19:28:39 It seems like the "canonical" URL is composed of an ID example URL: https://lequipe-skyrock.skyrock.com/3356709252-Comment-sauvegarder-ton-blog.html, that seems like it is sequential and unique across the network. However, I'm not sure of its usefulness. 19:31:47 They have an api https://www.skyrock.com/developer/documentation/api/ 19:32:02 The ID is the only part that identifies a post within a blog, and is the only needed part to find the post. If the name is not correct of absent, it will redirect to the correct url. 19:34:04 gah, we still need to know which blog it is on, can't buzz out the blog where the article lives on via a redirect 19:34:27 #bowlofpetunias 19:34:33 Seems like it 19:34:47 JAA Why? 19:35:08 Just my channel name suggestion. 19:36:59 since you didnt write "proposed" it looked like a channel announcement 19:38:27 And what would be the pun, as required by the rules? 19:40:47 #25732e07-b4e7-42c0-ad8a-a9bad8716b9b 19:40:49 :3 19:43:15 albertlarsan68: Looks like someone needs to listen to/read/watch The Hitchhiker's Guide To The Galaxy. 19:43:24 :-) 19:45:00 Something that is not contained in my French culture... 19:45:02 or; if you prefer; UUIDv7: #06495f63-2f1c-74f3-8000-0efab13ab36e 19:45:03 :D 19:45:16 but nah lmao 19:45:27 imagine trying to manage a channel list full of those 19:46:40 It seems like the API can't help too much doing basic post retrieving/discovering, the sitemap.xml file contains what we need. 19:48:33 probably a 2-stage project needed, buzzing out the posts/articles via sitemap hunting and then the core retrieval 19:50:50 Maybe get a first pool, then grow it via backfeed? 19:52:02 There is also an atom feed: example https://lequipe-skyrock.skyrock.com/atom.xml 19:54:29 or having someone create an account and maybe subscribe to everyone discovered, then maybe (not verified) there can be a feed/api callback (webhook type) to stay up to date on new posts? 19:59:53 Also, I have tried to create a wiki page for Skyblog (my user account is sensibly the same as on IRC). 20:03:31 let's see 20:03:35 what is this 20:04:23 french blogging site, really old. see on #archivteteam 20:04:36 yep 20:04:43 we have a channel! #bowlofpetunias thanks JAA 20:09:01 What would be the strategy? 22:09:20 JAA: is the ia repo not receptive to fixes then? seems like there's a lot of issues with it 22:09:28 or just too much work fixing it? 22:09:53 seeing as you implemented an entirely new uploader 22:10:21 imer: Jake is receptive to fixes, and I've fixed a bunch of things myself. 22:10:43 I implemented ia-upload-stream separately because adding multi-part uploads and parallelism to ia would've been a pain. 22:10:51 (just to be extra clear, different Jake :) 22:11:09 Oh right :-) 22:11:31 aw darn was just about to PM you with my list of wants :3 22:12:31 After I implemented multi-part uploads, I realised they were ... not exactly great on IA's side of things. There's a bunch of copying parts around, which adds significant overhead. 22:12:42 So I ended up adding the single-part uploading as well. 22:13:40 alright, makes sense. might look at the ia stuff then, if its easy enough to sort out I might 22:15:19 imer: source seems to be here: https://github.com/jjjake/internetarchive 22:15:27 yup yup 22:15:36 :) 22:15:59 oh there's a whole how to contribute page! 22:16:02 just asking since the stuff I ran into seemed to have been known for a while, never know if a project is on life support or something 22:16:09 i didn't have to like s/download/details/ and poke around lol 22:16:16 🤦 22:17:17 i do like that official releases are on the archive itself =] 22:25:13 a lot is happening in Russia, and most of it is happening through Telegram. we archive Telegram. if you haven't done so, please join and run the telegram project #telegrab 22:30:09 pokechu22: See above, I'll probably write a warrior project or similar for wysp.ys 22:43:12 Am running a telegrab container, but the channel was low signal-to-noise. Perhaps it's time to rejoin.. 22:45:59 arkiver: do telegram and imgur use the same targets? I could slow down the bruteforce queueing if we need target capacity 23:00:30 nicolas17: if needed i'll slow down the imgur project 23:00:40 okay 23:01:01 I'm now enqueueing bruteforce lists when todo reaches a threshold, rather than every N minutes 23:01:23 so if you reduce the rate limit, it will adapt to that 23:02:45 i love the jank → fancier route things always take over time 23:02:53 even if there is still jank 23:03:25 is that what having children is like? 23:03:27 lol 23:12:28 any youtube videos or channels related to what is happening in Russia now can be archived at #down-the-tube