-
h2ibot
Pedrosso edited Steam (+1237, Added the steam workshop as its own project as…):
wiki.archiveteam.org/?diff=51378&oldid=51377
-
h2ibot
-
h2ibot
-
h2ibot
JustAnotherArchivist edited Steam (+131, Move new section to the end of the page;…):
wiki.archiveteam.org/?diff=51381&oldid=51378
-
JAA
Pedrosso: Streams crossed there, your new edit has resulted in a conflict and I had to reject it.
-
Pedrosso
ok
-
h2ibot
JustAnotherArchivist edited Steam (+3, Fix Workshop image):
wiki.archiveteam.org/?diff=51382&oldid=51381
-
Pedrosso
whops
-
JAA
Welp :-)
-
JAA
> Your edit was ignored because no change was made to the text.
-
Pedrosso
Haha
-
Pedrosso
Wonderful
-
JAA
I'd like to get rid of the 'How can I help?' sections scattered all over the wiki at some point. They're only useful while a DPoS project is active.
-
JAA
Which is to say, they're all noise at this point.
-
Pedrosso
I mentioned that too. I was considering making them collapsable but that didn't work due to the section headers
-
JAA
Maybe we should include a message with similar contents on all {{in progress}} DPoS projects. That should even be possible automatically from the infobox template.
-
Pedrosso
That sounds efficient
-
Pedrosso
On the note of the steam workshop, I have coded a program ready to download all portal 2 workshop creations as long as I know a good way to upload it (not messing up the metadata, using a "nice" format, a "steam workshop" collection if needed)
-
JAA
Well, since the Workshop is a web interface, it should be WARC and go into the WBM.
-
Pedrosso
How exactly? As I mentioned in the wiki the download isn't directly off of the pages. On a steam workshop page you can only "subscribe" to an item which does stuff with the app. The script I made uses the API to grab it directly. (if you know all that already): How could that be put into a WARC and into the WBM?
-
pabs
is the API based on HTTP GET or POST?
-
pabs
if GET, then AB can be fed a list of API URLs to download
-
nicolas17
what if GET but auth cookies? :P
-
nicolas17
we need an alignment chart
-
nicolas17
where plain GET with nice URLs would be lawful good
-
JAA
Huh, is that new? I seem to remember downloading files directly from there. This would've been a few years ago though.
-
Pedrosso
GET, no cookies
-
Pedrosso
yes, you're right. A list of URLs would work
-
Pedrosso
actually, disregard that I got confused. Pull
-
Pedrosso
Post*
-
Pedrosso
GET was the one for the comments
-
Pedrosso
ok so, the API for getting items is POST and it's
api.steampowered.com/ISteamRemoteStorage/GetPublishedFileDetails/v1 with publishedfileids[0]=ID_HERE and itemcount=1 It can be used for more items
-
JAA
Ok, and then the actual download is a simple GET.
-
JAA
That can be archived into WARC in a way that would kind of work in the WBM, I think.
-
Pedrosso
> Ok, and then the actual download is a simple GET.
-
Pedrosso
It is? Can you give an example if I say the ID is 3058373765
-
JAA
-
Pedrosso
awesome
-
JAA
The server ignores the query string, so that can be (ab)used to retain the file ID context into the WBM.
-
JAA
Can't find an example of a workshop page with a download link from years ago, so I guess I misremembered. Huh.
-
Pedrosso
here's a (hopefully extensive) list of portal steam workshop ids
transfer.archivete.am/VK5SG/steamids.txt.zst
-
JAA
It's certainly extensive, more interesting would be whether it's exhaustive. :-)
-
JAA
I'm guessing there's an API for that as well?
-
Pedrosso
I meant exhaustive, thanks. I'm not aware of any API for that so I had the code go through the normal search pages
-
JAA
Hmm, maybe IPublishedFileService/QueryFiles.
-
fireonlive
i seem to remember
steamdb.info being a thing but looks like it's third party
-
» fireonlive back to lurk mode
-
Pedrosso
I think you're right. When I did that pull I didn't have a steam API key
-
JAA
Yeah, that requires an access key. :-/
-
JAA
3 billion-ish IDs is perfectly feasible, especially since you can request multiple IDs per request.
-
JAA
Assuming Valve lets it happen, that is.
-
JAA
We'd want requests like `curl --data 'itemcount=1&publishedfileids%5B0%5D=3058373765' '
api.steampowered.com/ISteamRemoteSt…&publishedfileids%5B0%5D=3058373765'` into WARC.
-
JAA
This still allows looking up file IDs in the WBM by also including it in the URL.
-
nicolas17
oh clever
-
JAA
We did the same thing for one of the YouTube projects.
-
Pedrosso
Very nice
-
Pedrosso
I thought AB could only do URLs
-
JAA
Who said anything about AB?
-
Pedrosso
Oh?
-
fireonlive
qwarc qwarc 🦆
-
Pedrosso
hehe
-
JAA
:-)
-
Pedrosso
my only regret is I lack a nice progress bar to stare at
-
fireonlive
-
nicolas17
-
eggdrop
-
fireonlive
-
JAA
If I'm seeing this correctly, slider.kz is a VK and Last.FM index. VK for the audio, and Last.FM for similar artist recommendations.
-
JAA
The search endpoint is plainly called vk_auth.php, and the audio URLs are on VK's CDN. The similar artists are less obvious, but the endpoint returns Last.fm's image URLs (which aren't displayed anywhere).
-
JAA
So probably virtually no unique data.
-
JAA
It's already well past its deadline, by the way; it was supposed to go down at the end of November.
-
JAA
-
fireonlive
interesting service
-
Pedrosso
fireonlive yes
-
Pedrosso
So, how's the qwarc-ing coming along?
-
JAA
Pedrosso: It isn't because both I and my machines are busy with too many other things currently.
-
c3manu
so, i will be attending an event between christmas and new years where i’ll have more bandwith than i could possibly use for a span of 4 days. the uplink should be mostly clean (except for some incident response) and temporary (so torrenting will be fine). i don’t have big hardware lying around, but i could take a few raspberry-pi-like devices with me. what's the most useful thing i could let them do archiving-wise?
-
murb
obs use more bandwith ;-)
-
c3manu
obs?
-
JAA
I think I know which event that is. :-)
-
c3manu
JAA don't tell me you're going as well :D
-
murb
obviously
-
c3manu
murb, ah
-
murb
i'll be there.
-
JAA
c3manu: Sadly no. :-(
-
c3manu
murb: ah, no wonder you know the NOC’s slogan then ;)
-
c3manu
JAA: bummer :/
-
JAA
Maybe next year. :-)
-
c3manu
nice, i’d like to say hi in person one day (if you'd be up for that)
-
c3manu
but still, is there a project that would make sense setting up on such small hardware? or does it defeat the purpose of many "smaller" participants to not get blocked entirely?
-
burak321
pokechu22 well that sucks :( It will remain read only, but I doubt it will stay that way for too long
-
pokechu22
Yeah. Depending on how fast the site is/if it blocks people going too fast, qwarc might be usable, but I don't know too much about how that works
-
h2ibot
Pokechu22 edited Deathwatch (+140, /* 2023 */
wizaz.pl/forum read-only…):
wiki.archiveteam.org/?diff=51383&oldid=51367
-
h2ibot
Pokechu22 edited Deathwatch (+74, /* 2023 */):
wiki.archiveteam.org/?diff=51384&oldid=51383
-
burak321
I didn't even realized it's that big. I guess it's a lost case then. Thanks for help
-
pokechu22
-
pokechu22
-
pokechu22
716134 threads, but there are probably lots of threads with at least 2 pages... I'd estimate between 1 and 3 million total pages that need to be saved, which is a lot but isn't impossible (it just wouldn't work well for archivebot)
-
JAA
Yeah, definitely feasible with qwarc if they have a decently generous rate limit.
-
DogsRNice
mittensquads twitter if anyone wants to archive it
twitter.com/mittensquad
-
eggdrop
-
JAA
AB job started.