01:41:39 Pedrosso edited Steam (+1237, Added the steam workshop as its own project as…): https://wiki.archiveteam.org/?diff=51378&oldid=51377 01:41:40 Pedrosso uploaded File:Steam workshop 2023-12-18.png (The main page of…): https://wiki.archiveteam.org/?title=File%3ASteam%20workshop%202023-12-18.png 01:41:41 Pedrosso uploaded File:Steam Workshop v3 2023.png (Steam Workshop Banner Image): https://wiki.archiveteam.org/?title=File%3ASteam%20Workshop%20v3%202023.png 01:47:40 JustAnotherArchivist edited Steam (+131, Move new section to the end of the page;…): https://wiki.archiveteam.org/?diff=51381&oldid=51378 01:49:45 Pedrosso: Streams crossed there, your new edit has resulted in a conflict and I had to reject it. 01:50:04 ok 01:51:41 JustAnotherArchivist edited Steam (+3, Fix Workshop image): https://wiki.archiveteam.org/?diff=51382&oldid=51381 01:51:50 whops 01:52:00 Welp :-) 01:52:30 > Your edit was ignored because no change was made to the text. 01:52:52 Haha 01:53:00 Wonderful 01:55:28 I'd like to get rid of the 'How can I help?' sections scattered all over the wiki at some point. They're only useful while a DPoS project is active. 01:55:42 Which is to say, they're all noise at this point. 01:56:30 I mentioned that too. I was considering making them collapsable but that didn't work due to the section headers 02:01:06 Maybe we should include a message with similar contents on all {{in progress}} DPoS projects. That should even be possible automatically from the infobox template. 02:01:37 That sounds efficient 02:02:26 On the note of the steam workshop, I have coded a program ready to download all portal 2 workshop creations as long as I know a good way to upload it (not messing up the metadata, using a "nice" format, a "steam workshop" collection if needed) 02:03:43 Well, since the Workshop is a web interface, it should be WARC and go into the WBM. 02:34:38 How exactly? As I mentioned in the wiki the download isn't directly off of the pages. On a steam workshop page you can only "subscribe" to an item which does stuff with the app. The script I made uses the API to grab it directly. (if you know all that already): How could that be put into a WARC and into the WBM? 02:47:49 is the API based on HTTP GET or POST? 02:48:13 if GET, then AB can be fed a list of API URLs to download 02:53:29 what if GET but auth cookies? :P 02:53:35 we need an alignment chart 02:53:50 where plain GET with nice URLs would be lawful good 03:15:33 Huh, is that new? I seem to remember downloading files directly from there. This would've been a few years ago though. 03:19:44 GET, no cookies 03:19:57 yes, you're right. A list of URLs would work 03:21:00 actually, disregard that I got confused. Pull 03:21:06 Post* 03:21:22 GET was the one for the comments 03:26:01 ok so, the API for getting items is POST and it's https://api.steampowered.com/ISteamRemoteStorage/GetPublishedFileDetails/v1/ with publishedfileids[0]=ID_HERE and itemcount=1 It can be used for more items 03:26:55 Ok, and then the actual download is a simple GET. 03:27:15 That can be archived into WARC in a way that would kind of work in the WBM, I think. 03:27:58 > Ok, and then the actual download is a simple GET. 03:27:58 It is? Can you give an example if I say the ID is 3058373765 03:28:17 https://steamusercontent-a.akamaihd.net/ugc/2117314083157632215/B7FF5C4548936111546D0F348FECE251F8F4A1E7/ 03:28:26 awesome 03:29:20 The server ignores the query string, so that can be (ab)used to retain the file ID context into the WBM. 03:31:43 Can't find an example of a workshop page with a download link from years ago, so I guess I misremembered. Huh. 03:32:38 here's a (hopefully extensive) list of portal steam workshop ids https://transfer.archivete.am/VK5SG/steamids.txt.zst 03:33:31 It's certainly extensive, more interesting would be whether it's exhaustive. :-) 03:33:43 I'm guessing there's an API for that as well? 03:34:27 I meant exhaustive, thanks. I'm not aware of any API for that so I had the code go through the normal search pages 03:34:47 Hmm, maybe IPublishedFileService/QueryFiles. 03:34:51 i seem to remember https://steamdb.info/ being a thing but looks like it's third party 03:34:59 * fireonlive back to lurk mode 03:36:11 I think you're right. When I did that pull I didn't have a steam API key 03:36:22 Yeah, that requires an access key. :-/ 03:37:27 3 billion-ish IDs is perfectly feasible, especially since you can request multiple IDs per request. 03:39:01 Assuming Valve lets it happen, that is. 03:39:55 We'd want requests like `curl --data 'itemcount=1&publishedfileids%5B0%5D=3058373765' 'https://api.steampowered.com/ISteamRemoteStorage/GetPublishedFileDetails/v1/?itemcount=1&publishedfileids%5B0%5D=3058373765'` into WARC. 03:40:18 This still allows looking up file IDs in the WBM by also including it in the URL. 03:41:44 oh clever 03:42:07 We did the same thing for one of the YouTube projects. 03:42:52 Very nice 03:53:55 I thought AB could only do URLs 03:56:00 Who said anything about AB? 03:59:33 Oh? 03:59:59 qwarc qwarc 🦆 04:00:03 hehe 04:04:00 :-) 04:04:34 my only regret is I lack a nice progress bar to stare at 04:17:20 Pedrosso: https://dl.fireon.live/irc/1035455b3b1f59a3/please-wait.gif 04:22:06 Pedrosso: https://twitter.com/neilsardesai/status/1399037054957326339 04:22:06 nitter: https://nitter.net/neilsardesai/status/1399037054957326339 04:47:48 https://9to5mac.com/2023/12/18/apple-halting-apple-watch-series-9-and-apple-watch-ultra-2-sales/ 04:49:30 If I'm seeing this correctly, slider.kz is a VK and Last.FM index. VK for the audio, and Last.FM for similar artist recommendations. 04:52:26 The search endpoint is plainly called vk_auth.php, and the audio URLs are on VK's CDN. The similar artists are less obvious, but the endpoint returns Last.fm's image URLs (which aren't displayed anywhere). 04:54:44 So probably virtually no unique data. 04:56:07 It's already well past its deadline, by the way; it was supposed to go down at the end of November. 04:56:39 Cf. https://nitter.net/x_slider/status/1720341321062228252 05:12:35 interesting service 05:45:21 fireonlive yes 09:21:47 So, how's the qwarc-ing coming along? 15:30:24 Pedrosso: It isn't because both I and my machines are busy with too many other things currently. 17:41:26 so, i will be attending an event between christmas and new years where i’ll have more bandwith than i could possibly use for a span of 4 days. the uplink should be mostly clean (except for some incident response) and temporary (so torrenting will be fine). i don’t have big hardware lying around, but i could take a few raspberry-pi-like devices with me. what's the most useful thing i could let them do archiving-wise? 17:51:28 obs use more bandwith ;-) 17:52:14 obs? 17:52:22 I think I know which event that is. :-) 17:52:37 JAA don't tell me you're going as well :D 17:52:38 obviously 17:52:43 murb, ah 17:53:05 i'll be there. 17:54:20 c3manu: Sadly no. :-( 17:54:22 murb: ah, no wonder you know the NOC’s slogan then ;) 17:54:46 JAA: bummer :/ 17:55:51 Maybe next year. :-) 17:57:41 nice, i’d like to say hi in person one day (if you'd be up for that) 18:01:05 but still, is there a project that would make sense setting up on such small hardware? or does it defeat the purpose of many "smaller" participants to not get blocked entirely? 18:37:50 pokechu22 well that sucks :( It will remain read only, but I doubt it will stay that way for too long 18:38:49 Yeah. Depending on how fast the site is/if it blocks people going too fast, qwarc might be usable, but I don't know too much about how that works 18:40:24 Pokechu22 edited Deathwatch (+140, /* 2023 */ https://wizaz.pl/forum/ read-only…): https://wiki.archiveteam.org/?diff=51383&oldid=51367 18:41:24 Pokechu22 edited Deathwatch (+74, /* 2023 */): https://wiki.archiveteam.org/?diff=51384&oldid=51383 18:42:30 I didn't even realized it's that big. I guess it's a lost case then. Thanks for help 18:45:58 It's a little bit simpler since each individual post doesn't need to be saved, only every page in a thread (e.g. only https://wizaz.pl/forum/showthread.php?t=1283338 https://wizaz.pl/forum/showthread.php?t=1283338&page=2 https://wizaz.pl/forum/showthread.php?t=1283338&page=3 and not https://wizaz.pl/forum/showpost.php?p=88917750&postcount=1 18:46:01 https://wizaz.pl/forum/showpost.php?p=88917912&postcount=2 https://wizaz.pl/forum/showpost.php?p=88919041&postcount=3 ... https://wizaz.pl/forum/showpost.php?p=89694400&postcount=71) - those post links have the same info as the thread pages 18:47:30 716134 threads, but there are probably lots of threads with at least 2 pages... I'd estimate between 1 and 3 million total pages that need to be saved, which is a lot but isn't impossible (it just wouldn't work well for archivebot) 19:35:00 Yeah, definitely feasible with qwarc if they have a decently generous rate limit. 21:35:18 mittensquads twitter if anyone wants to archive it https://twitter.com/mittensquad 21:35:19 nitter: https://nitter.net/mittensquad 21:39:44 AB job started.