-
schwarzkatz
any progress on uploadir yet? :D
-
anarcat
JAA: thanks
-
fishingforsoup_
Can all of this be archived?
-
fishingforsoup_
-
fishingforsoup_
I never followed the group, but it might be hard to find years down the line.
-
Ryz
So, out of pure curiosity (and boredom as I archive a bunch of Pastebin URLs via WBM/SPN), I checked out Fiverr and searched the word 'archive'; apparently there are people who offer services to restore websites from the Wayback Machine back into Wordpress Oo;
-
Ryz
Or just another website
-
OrIdow6
I did look into that a while ago and as far as I could tell that was used a lot (though not exclusively) by spammers who would buy expiring domains, restore the content from the WBM, throw some ads onto them, and put them online
-
monika
do they even bother stripping out the timings+copyright info in the html? 😂
-
mgrandi
-
qwertyasdfuiopghjkl
Some relevant-seeming info in that thread: The forums will go read-only on 2023-01-01 and be fully deleted at some time "in the first quarter [of 2023]". (
forums.furaffinity.net/threads/foru…ng-soon.1682702/page-5#post-7378226 ,
-
qwertyasdfuiopghjkl
-
qwertyasdfuiopghjkl
-
qwertyasdfuiopghjkl
might be an option.
-
dhrrr
Are there any plans for Twitter? While an exhaustive archive would be infeasible, can we at least archive popular tweets (for example, those with at least a certain number of likes)? snscrape allows filtering by like count
-
ivan
Twitter users can be submitted in #archivebot
-
ivan
there's a lot of Twitter in WBM
-
dhrrr
Yes but twitter has also many "main characters of the day", and isolated popular tweets. People who are not notable and that won't by sent to #archivebot.
-
ivan
if you have methods to make a good list of users, there's a way to submit a lot of users
-
ivan
you can, for example, scrape follows and analyze graphs of follow relationships
-
dhrrr
I'm aware of that. I'm just surprised that there isn't an ArchiveTeam project for dealing with popular tweets regardless of who posted them
-
OrIdow6
monika: I suspect but am sure that this is partially responsible for the proliferation of "welcome to the US petabox" (Google it in quotes) pages that A oede noticed a while ago
-
OrIdow6
For a more modern example, here's a site that seems to be doing this
germanyweek.org - notice the links about an online casinos at the bottom - and here
germanyweek.org/program is a page where they've messed up on the crawling and copied a WBM error page
-
OrIdow6
And I am being deliberate with putting this in bs instead of ot, I do think this is relevant to resource allocation on smaller sites and thought about doing some kind of writeup about it eventually
-
OrIdow6
Sounds like a good idea qwertyasdfuiopghjkl
-
OrIdow6
Anyone want to do it?
-
OrIdow6
This group seems to be full of furries so I don't think we should have difficulty establishing rapport at that end
-
mgrandi
I'll pm the mod, see if I can start the conversation
-
OrIdow6
Ok
-
mgrandi
I also have experimented in archiving the main FA content, they don't have any rate limiting or anything that I can see, it is behind cloudflare though
-
OrIdow6
We ran a project for that half a decade ago or so I believe
-
JAA
Oh wow, that project used wpull, interesting.
-
OrIdow6
Sounds like Python version misery to me
-
mgrandi
Yeah, I used wpull for that as well, for my own infrastructure set up, it's pretty easy, once you have the cloudflare key, I need to set up some ignore rules and then rerun it to get the media
-
OrIdow6
Depending on the forum thing goes it may raise the possibility of doing that without risking antagonizing them
-
schwarzkatz
always cool to see site owners wanting to help archive their sites. I have a good feeling about this FA project.
-
arkiver
qwertyasdfuiopghjkl: please add it to deathwatch