-
flashfire42
-
OrIdow6
flashfire42: Are those documents involved in the Apple lawsuit?
-
flashfire42
yes
-
flashfire42
-
flashfire42
tried archivebot and it didnt work
-
JAA
Yup, Box is JS crap. I'm not aware of any good way to archive it.
-
OrIdow6
Yeah, that's way too JS-heavy
-
OrIdow6
Ther'es a download button that compiles it into a ZIP
-
OrIdow6
Obviously not going to work in the WBM
-
mgrandi
-
thuban
in re giantbomb discussion: the same late-2020 acquisition (viacomcbs selling cnet media group to red ventures) put gamefaqs under new ownership; it has been suggested on the subreddit that the boards may be in danger
-
thuban
(op says "The website was bought by a new company recently and they have said that most of the people who visit the site these days just wanna look at the faqs and the message boards are not visited much anymore..." but doesn't provide a source, zero points)
-
thuban
eyeballing the front page suggests several million topics, let alone pages, so probably not an archivebot job, but maybe worth looking into
-
Jack_Thompson
Would be pretty devastating to lose GameFAQs imo
-
Jack_Thompson
A lot of gaming history there
-
Krownest
Moved from doing everything manually to docker.
-
mgrandi
We can start a project for each, all it takes is one person to create a url list honestly
-
JAA
thuban: 80 million topic IDs, post IDs are approaching a billion.
-
JAA
Definitely no AB matter, obviously.
-
arkiver
reuters losing it
-
arkiver
JAA: nice, could do a tiny project
-
mgrandi
So on the reuters thing , they might have just been migrating stuff
-
mgrandi
-
mgrandi
But is giant bomb owned by the same parent company as gamefaqs?
-
thuban
mgrandi: yes, they were both part of viacomcbs
-
thuban
JAA: ids look sequential, but (a) board urls appear to require a name slug and (b) while i suspect thread ids are unique sitewide, thread urls appear to require the board id/name
-
JAA
Yep, I found the same thing. Looking for a URL that just uses the topic ID, but haven't seen anything yet.
-
thuban
that and their 'one board for every single game, browse by system' would make enumeration doable but nontrivial
-
thuban
(oh, and the non-game boards have equally bespoke categorization and listing)
-
mgrandi
Do they require a name slug?
-
mgrandi
Those are usually optional
-
thuban
the boards? looks like, yes (unless there is a secret alternate url scheme)
-
JAA
-
JAA
But you need the ID.
-
thuban
oh! i was fooled because i didn't leave the _trailing hyphen_
-
mgrandi
Cool
-
mgrandi
Although one might want the full slug since that's what people would want
-
mgrandi
Maybe doing a header only request for that url will return the full url?
-
thuban
nope
-
thuban
the thread and (more importantly) pagination links all include the slug, though, so it would be easily had
-
JAA
We're obviously not going to bruteforce 300k board IDs times 80M topic IDs.
-
thuban
no, of course not
-
JAA
But we could use it to bruteforce the boards without having to traverse their games list etc.
-
thuban
right right, and if necessary (seems likely) use them to spider topics
-
JAA
Hmm,
gamefaqs.gamespot.com/boards/3- only shows topics from the past week.
-
thuban
aggressive pruning? some of the game boards definitely have very old posts
-
JAA
-
JAA
-
mgrandi
I'm not saying brute forcing , all of the board IDs should be discoverable
-
thuban
er, and by "very old" i mean 2008; gamefaqs appears to have had boards (of some form) as far back as 2000.
-
JAA
Wikipedia mentions that boards were shared between GameFAQs and GameSpot between 2004 and 2012.
-
JAA
I've only seen a couple posts from before 2012, so that seems related.
-
JAA
There are also topics that don't show up on the corresponding board:
gamefaqs.gamespot.com/boards/11-sballin/62745571
-
mgrandi
They say they have boards per game
-
mgrandi
-
mgrandi
-
mgrandi
-
nuroten
arkiver, SketchTheCow: what was the question about RTHK? was someone searching for translation, general suggestions on what to focus on backing up or ...
-
nuroten
if the question is about who might be interested in an archive of the materials — maybe independent online media, haven't got a name but there was a museum that displayed items from the Umbrella movement
-
nuroten
-
nuroten
the LIHKG forum is frequented by local residents, they will probably be able to provide some names
-
nuroten
local museums and libraries may have the connections but I'm guessing most won't take them anymore
-
nuroten
thuban: I'm slowly wading through the list of Letter to Hong Kong, at least to grab the audio files, no need to look into that one in particular
-
nuroten
back later, thanks archiveteam and anyone else helping with the RTHK things!
-
mgrandi
Later :)
-
JAA
I was going to write to MeriStation, but their contact form is broken (404 after submission). Welp.
-
guest00014
Everything I could say is that your are doing a great job. You've done so much for archiving Internet sites. And Internet is so unexpectedly (for me) fragile... Have you heard of codepad.org that won't respond for several weeks? That is terrifying (in a way)...
-
thuban
the good news is that the 'streaming' version of _hong kong connection_ is in 720p (as opposed to the 'archive', which is 480p)
-
thuban
the bad news is that it will therefore take one zillion years to download
-
mgrandi
@guest00014: what about codepad?
-
nuroten
thuban: haha ... whichever you think is the better approach
-
guest00014
@mcgrandi codepad.org used to be an online code interpreter (you know, like jsfiddle, but for several languages), created by Steven Hazel back in 2008. People used it to save code under user accounts, etc. It seems like it .. just disappeared in March 2021, without notice...
-
thuban
arkiver: each 'Hong Kong Connection' episode comes in a 720p 'vod' version hosted as a playlist with segments, and a 480p 'archive' version hosted as a single video. i plan to download the higher-quality versions to upload to ia, but which do we want for the wbm? both?
-
thuban
(whoops, actually there are several segmented versions for each ep, at different qualities; i've just only been paying attention to the best)
-
guest00014
And since it had no CORS, it used to be a cosy place to store js-code and then embed its raw-code to use as part of bookmarklets etc (e.g. pastebin is paid-only for disable CORS, hastebin uses CORS etc). A bit terrifying (and maybe unexpected) is that there is no info what has really happened. But it seems like it can happen to any site...
-
masterX244
Was off for the night. Only linked that apple vs epic box.com link since i got a shitty upload at my end and no quick way to bounce the pdfs over my server for faster upload
-
OrIdow6
masterX244: "Only linked"? I don't understand what it was you did
-
masterX244
referring to a link i posted earlier in this chat
-
masterX244
was regarding to a comment on that it doesnt work with the WBM
-
masterX244
I got really slow upload ==>600MB takes a while for upload. Linked that folder here yesterday so others are aware of it
-
Peca21
I am getting this error in warrior Retrying after 60 seconds...
-
Peca21
exit code 5 for Item y0iGzQ6W
-
Peca21
max connections (-1) reached??? why -1
-
rewby
For what project?
-
Peca21
Pastebin
-
Peca21
also this
-
Peca21
Retrying after 60 seconds...
-
Peca21
RsyncUpload for Item gGR6rR0w
-
Peca21
NVM thats the same thing
-
Kaz
EggplantN: your box?
-
Peca21
I switched to reddit now but it wont switch because its stuck on this error
-
EggplantN
pastebin?
-
EggplantN
ye
-
Peca21
reddit works fine for me
-
yarrow
Does anyone know how 130k items got added to the Yahoo Answers archive between today and yesterday?
-
rewby
kid urls were still responding for a few hours, so they tried scraping them
-
EggplantN
A question for #noanswers
-
yarrow
I was inexplicably banned from there despite personally creating over 3% of the Yahoo Answers archive, recruiting many people to the project, and taking time to answer questions in the channel =\
-
yarrow
I think maybe you were annoyed that when someone asked how much we had archived I joked that it was 69.420%?
-
yarrow
Please understand I'm a volunteer, I gave up my time for this, I spent my own money on spinning up a lot of VMs, and it took away from my obligations and relationships to devote my attention to this project. I just ask to be treated with basic respect and decency.
-
Kaz
checking..
-
nuroten
nyany: looks like framasoft is already on deathwatch but maybe a request could be made to queue the pastebin earlier?
-
ThreeHeadedMonkey
Looks like you'd have to provide the decryption key as part of the URL in order to view anything on framabin or you'll just see an error message
-
ThreeHeadedMonkey
Although the encrypted ciphertext is always downloaded even if the key is missing, so that could be saved
-
ThreeHeadedMonkey
Although saving a bunch of encrypted messages without key doesn't exactly sound very useful...
-
ThreeHeadedMonkey
keys*
-
hook54321
nyany: apparently they were set to expire after a week by default
-
nyany
nuroten / hook54321 ah i should have checked dw first
-
nyany
i just happened to stumble upon it lol
-
nuroten
yeah, they've been slowly winding down their hosted services for some time, a few at a time, so maybe it's not a bad idea to ask if backups haven't already yet been scheduled
-
nuroten
the closing schedule:
alt.framasoft.org/en
-
ThreeHeadedMonkey
"We refuse to become the « default » solution and to monopolize your uses and attention" - What a creative excuse for deleting their user's data
-
arkiver
nyany: any public data on framabin?
-
nuroten
they are a small non-profit that probably wanted to raise awareness about open source, let people try out the apps. the sunset is happening slowly at least, not disappearing overnight
-
nuroten
they don't really have a business model to sustain the hosting, it was done through crowdfunding basically
-
nuroten
the initial announcement was back in September 2019, most of the stuff is still online but read-only
-
lunik1
it might be worth reaching out to them directly?
-
nyany
I don't know if that's possible because it's an encrypted pastebin
-
nyany
arkiver: from a quick glance no, but crawls contain links back to their service
-
betamax
Has anyone looked at archiving the websites / social media from the UK elections tomorrow?
-
betamax
("UK elections" => actually parliment elections for Scotland / Wales, and council elections for England)
-
betamax
I can get a lot of websites / twitters / facebooks / etc... from democracy club
-
betamax
But I recall from a month or so ago that there were issue archiving from facebook / instagram due to rate limiting - anyone know if this is still the case (JAA?)
-
JAA
betamax: That is still the case, and in fact Facebook has become even worse recently.
-
betamax
Is twitter still OK? (with the latest version of snscrape?)
-
JAA
Yeah
-
betamax
OK, I'll focus on candidate + party websites and twitter first
-
JAA
MeriStation is gone, by the way.
-
Jake
:(
-
thuban
so, uh... liveleak is not actually gone. the front page and 'browse' are redirecting, but channel pages and individual video items are still there. probably not for long. are we going to get on this?
-
Ryz
*breathes in heavily* We better get a load of LiveLeak as /much/ as possible before it completely BOMBS itself!
-
Ryz
JAA, arkiver, etc ^
-
thuban
very little xhr on video pages (video urls in html as <video> <source>, related item links in html wrapped in some js); channels are less nice
-
Ryz
Provide as much information as possible on how the content can be accessed
-
thuban
-
thuban
-
Jake
is there an easy way to discover channels?
-
thuban
-
Ryz
New IRC channel suggestion: liveleaked ?
-
thuban
deadleak, surely
-
Ryz
Waiting for JAA for approval
-
Ryz
livedry?
-
thuban
^ whoops, my mistake: channel and search pages both include results in the html, no api finagling needed
-
holbrooke
"live die repeat" -> "live-die-liveleak"
-
JAA
++deadleak
-
Ryz
Waiting for the channel to openm
-
Ryz
*open
-
Jake
deadleak is my favorite of the bunch
-
» Doranwen likes deadleak, even though she can't really help with this one
-
JAA
Created
-
Ryz
My contribution is saying obvious and crappy channel names and hope others come up with a more creative name :p
-
Ryz
#deadleak
-
arkiver
#liveleaked
-
arkiver
^ thats the channel
-
arkiver
use #deadleak
-
Terbium
is there any existing tool to dedupe a bunch of WARC files by digest/hash? I have a bunch of WARCs with dupe record payloads.
-
JAA
If anyone here knows/understands Bulgarian and could tell me what a guy in a short 35-second video is roughly saying, please get in touch.
-
JAA
Yes, this is archival-related. :-)
-
gazorpazorp
I'm Bulgarian
-
nyany
JAA: ^
-
JAA
Wonderful!
-
gazorpazorp
I'll translate, just send something to translate :)
-
JAA
See PM
-
crispyalice2
Any word on when the parler stuff is going to be made public?
-
mgrandi
No idea how the upload is going
-
crispyalice2
theres a collection on the archive.org with a bunch of items but its privated
-
mgrandi
I'm not sure if that is from the project, it might be marked as private due to the weird legal issues around it , would have to ask