-
hitchhitchhitch
Hello all, it seems like The China Project (
thechinaproject.com/2023/11/06/some-sad-news) is shutting down. Can someone please send this through archivebot?
-
JAA
hitchhitchhitch: Thanks, I've started a job for it.
-
Edel69
Hello. Quick question about the Imgur Archive Project. Imgur abruptly deleted my account without warning and I lost all of my uploads. Will it be possible to identify a specific Imgur user's deleted files and albums once the archived data is ready to be publicly released?
-
nicolas17
Edel69: hmm not sure if public albums were archived
-
pokechu22
We also never archived the user profile pages (though we collected a list of them I think)
-
Edel69
None of my uploads were public. I guess that makes this even worse then. I just can't believe they nuked my decade old account with no warning, and I'm apparently not the only one.
-
nicolas17
Edel69: no, other way around, public galleries weren't archived because they were at less risk of deletion
-
pokechu22
You might be able to recover a list from browser history?
-
nicolas17
so random uploads that *weren't* on public galleries (the publicly-seen side of imgur.com where mostly memes are shared) are more likely to be archived
-
nicolas17
problem is I'm not sure how to get them by username
-
JAA
It might in theory be possible to find albums, but since there is no index, you'd have to go through the full 650 TiB of data to find them, so it's infeasible.
-
JAA
Individual image pages don't contain the username.
-
arkiver
yeah no easy way
-
nicolas17
JAA: are the warcs even public?
-
arkiver
i think so
-
arkiver
yeah they are
-
pokechu22
note that galleries and albums are different
-
Edel69
nicolas17 I guess this means there's no hope then. Thanks for the clarification. pokechu22 Are you referring to a list of URLs?
-
nicolas17
URLs or image IDs
-
pokechu22
Pulling up an example from my browser history... which was never saved, I guess I need to mine that for more URLs...
imgur.com/a/hZmgsE8 exists but
imgur.com/gallery/hZmgsE8 doesn't, but on the other hand
imgur.com/a/MSeaL6C is the same as
-
JAA
Yeah, albums and galleries and their relations are weird. I think we discussed that in detail in #imgone early in the project.
-
arkiver
yeah
-
nicolas17
are you EricBowman86?
-
nicolas17
or was that just a random one from browser history, not your upload?
-
pokechu22
ah, but
i.imgur.com/6WN7pub.png was saved, probably extracted from my IRC logs, so it's *only* the album that wasn't saved
-
pokechu22
imgur.com/a/MSeaL6C is a random one from the "most viral" section
-
pokechu22
my username on imgur was pokechu22 though I also uploaded a lot of stuff when not signed in
-
Edel69
pokechu22 Probably wouldn't work out well because I had a lot of private albums and images. I doubt they're all in the browser history.
-
arkiver
i lost some channels previously on the 9th of July apparently
-
arkiver
just reconnected to them
-
unvariedexcuse
hi all
-
unvariedexcuse
do you know of any effort towards preserving twitter spaces? (audio rooms on the website now known as X)
-
arkiver
nowadays those are completely behind a login wall right?
-
arkiver
i remember at some point one could listen to them without login, but last time i checked one it was behind a login
-
unvariedexcuse
arkiver: some metadata yes
-
unvariedexcuse
arkiver: but not actual audio chunks IIRC
-
nicolas17
how do we get the URLs to the audio chunks though?
-
arkiver
yeah
-
arkiver
same question here
-
unvariedexcuse
nicolas17: the live_video_stream API should be usable without login
-
nicolas17
also metadata is kind of important, we don't want a giant pile of unlabeled mp3s :)
-
unvariedexcuse
as they seem to be based off periscope infra, a fair amount of code could be shared with the periscope grab
-
unvariedexcuse
-
arkiver
i'll check them out, did not have a very good look at twitter spaces yet
-
unvariedexcuse
some may be officially unrecorded but if you get the m3u URL while they're live you can download them in full within 30 days
-
unvariedexcuse
watching for live spaces via the avatar_content API would likely not be suited for warriors as it requires login AFAIK
-
unvariedexcuse
searching for spaces via other means (either recorded or not) is otherwise notoriously difficult
-
unvariedexcuse
would they even be in scope for AT? your call
-
jarfeh
Hello there! I'm trying to recover some missing videos off youtube that were titled "lounge edit". I recently found a website called YouTube Video Finder made by a "TheTechRobo". I have the URL for some of the deleted videos, but this website mentioned that there is a "#youtubearchive" here that has the video?
-
JAA
/join #youtubearchive
-
pabs
archive.org has some youtube saved too btw, join #down-the-tube for that
-
jarfeh
I did check archive first, however it just only has the page saved of some of the videos without the actual video saved
-
TheTechRobo
yeah, you'll want #youtubearchive
-
TheTechRobo
the command that J.AA sent should work
-
jarfeh
It did! Thank you for your website by the way, I came across it in a comment thread in the DataHoarder reddit and it's helped with recovering
-
TheTechRobo
jarfeh: awesome, glad to hear it! :-)
-
nulldata
-
eggdrop
-
nulldata
Might be good to backup the Zero Punctuation videos
-
JAA
nulldata: The entire channel is already running through #down-the-tube. :-)
-
nulldata
Aw sweet- thanks :)
-
nulldata
Ah*
-
JAA
Oh, there's a separate channel from the general Escapist one.
-
JAA
Or maybe that's unofficial.
-
nulldata
I wonder if there's any Escapist videos exclusive to the site and not on the YT channel? A reply to the post says the entire video team left
-
magmaus3
Sharing here just in case:
lemmy.sdf.org/post/7179616
-
thuban
-
thuban
site itself looks ok for archivebot except for disqus (images are lazy-loaded but have in-source srcs)
-
thuban
music is behind an onsite landing page which base64s a link to an offsite landing page (either a login-walled forum or a link shortener) which links to a third-party host (mostly google drive/mega), so no chance of abing that
-
masterX244
Link shorteners could probably be extracted by some warc-digesting and then crunched out
-
thuban
in theory, yeah
-
thuban
(would have to go through another round of ab since it requires you to click through rather than being a redirect--oddly enough, the site claims to have a captcha but just works with js disabled)
-
thuban
in practice, idk how valuable it would be given that we don't have tooling for those file hosts
-
arkiver
the problems at IA 1 to two months ago have now been fully fixed
-
thuban
excellent! :D
-
nicolas17
arkiver: cool, let's resume bruteforcing imgur
-
nicolas17
(let's not :D)
-
arkiver
:P
-
mgrandi
-
mgrandi
s-resignations-after-eic-firing%2F
-
mgrandi
Oh my gosh I got bamboozled by the url length I'm sorry
-
JAA
-
mgrandi
Yes that
-
JAA
The download links on
hikarinoakari.com are a mess. Some go to a link shortener with a captcha, some go to a Twitter account, etc.
-
JAA
The site itself is running through AB though.
-
vokunal|m
Now that IA is ok, can we get mediaonfire unclogged? If nothing changed, it's still going into temp storage
-
arkiver
vokunal|m: mediafire still going to temporary storage?
-
arkiver
i see WARCs appearing on IA
-
arkiver
seems like it is going to IA
-
arkiver
and it doesn't seem to be clogged
-
vokunal|m
ah cool
-
vokunal|m
it must be flowing smooth then. I was thinking the out was clogged, but it must just be some funky items. They've been flowing nonstop, but slow
-
arkiver
we also queue mediafire items discovered in #//
-
vokunal|m
That makes sense why the claims wouldn't be going down that fast. I was confused when we had around 40k todo, and it drained into claims over a week or so, but didn't seem to be leaving claims
-
fireonlive
🥳
-
Pedrosso
I'm new to hackint.org as well as to this chat. The wiki says this channel is supposed to be the right one to ask/inform about dying websites, is that accurate?
-
pokechu22
Pedrosso: Yes
-
Pedrosso
Spore is a game that's been out since september 4th 2008, and support (by EA) has been declining. Sporepedia. There's no official shutdown date afaik, however I am anxious considering the company that's hosting them (EA), and how the company has already almost broken the game in itself with its own launcher. What I'm worried about saving is the
-
Pedrosso
sporepedia (spore.com) A large and very old website with millions of users and creations (>10 million enumerated files with approximately the average filesize of 20kB) I've been using my own (bad) code to save only the creations, but it's inefficient and also leaves out all the forums, creators, comments, etc.
-
Pedrosso
I don't know how much this community cares for archiving such stuff, as it's a niche thing. Mind enlightening me?
-
arkiver
can ArchiveBot handle that spore.com ?
-
arkiver
would be nice to have a copy yes
-
pokechu22
EA doesn't like us unfortunately :|
-
arkiver
does that matter? :P
-
JAA
If they block or rate-limit us, it does.
-
pokechu22
Most of their websites timeout with archivebot, though that's mostly newer stuff (ea.com and like battlefront I think). Not sure if spore.com is also affected
-
Pedrosso
I have been downloading these files for a long while and have had 0 apparent problems with rate-limiting, etc
-
pokechu22
or, not even timeout, instead it acts more like a tarpit if I recall correctly
-
Flashfire42
Holy shit that site shared in the main channel is cancerous so many popups
-
pokechu22
(for reference the site shared there was
hikarinoakari.com /
imgur.com/jeJSEu6)
-
JAA
I'm getting an expired cert on spore.com. Nice.
-
Pedrosso
the site is very much in a state of disrepair, hence why I'm concerned
-
pokechu22
I believe ScenarioPlanet sent me some spore-related stuff and that worked fine in the past, but it was a fairly small subset
-
Pedrosso
spore.com/sporepedia is what I'm referring to specifically
-
JAA
The site does not work well without JS, so there's that.
-
Flashfire42
Spore just timed out for me on the main domain there. spore.com
-
JAA
Yeah, took me a few tries as well to get there.
-
Flashfire42
sporepedia loads fine maybe it needs teh www
-
arkiver
needs the www indeed
-
JAA
We can certainly try to run it through ArchiveBot.