-
jamesp
I posted in #archiveteam that @joinmastodon was just banned from Twitter
-
jamesp
It seems like any twitter account that mentions mastodon could get banned and we might have to start archiving some notable ones at this point
-
phuzion
chfoo_: you around to help possibly troubleshoot some seesaw issues?
-
ivan
jamesp: just saw someone get suspended for posting a mastodon handle to elonjet with screenshots of the suspension
-
ivan
-
ivan
-
benjins
I've seen reports of a lot of accounts being suspended, not sure what for: atrupar igd_news drewharwell RMac18 donie MattBinder
-
ivan
let's see if I get deleted for posting elonjet.net
-
ivan
banned search terms returning 0 results I guess:
-
ivan
-
ivan
-
ivan
-
» pabs sends that person to snscrape
-
pabs
-
anarcat
what a garbage fire
-
pabs
-
benjins
W7VOA now as well
-
decay
how's the twitter data collection going? are there alternative data streams that're being used other than the trickle I'm seeing from the twitterstream collection?
-
ivan
users can be submitted in #archivebot
-
ivan
there's stuff that goes into WBM but without WARCs available
-
pabs
-
Arcorann_
I hear Elon shut down Twitter Spaces after he found one with a bunch of people he suspended
-
Arcorann_
(probably better in -ot)
-
Jackster
Trying to backup some old game mods and maps from a publicly accessible apache server. One can brows the files via the Apache generated file browser. I am looking for a tool to help me backup all the files on these sites. Any suggestions?
-
ivan
Jackster: wget, grab-site, HTTrack
-
JAA
Or tell us the URL, and we can archive it into the Wayback Machine.
-
Jackster
I have tried wget and HTTrack, both result in issues. The config files with the mods is not formatted, as in the text is all one line. The large files sometimes come down as .html files instead of their original file types
-
Jackster
I will give grab-site a go
-
monika
highly recommend opendirectorydownloader for this kind of stuff
-
monika
you can feed the generated links to wget/aria2c/whatever
-
monika
grab-site is horrendously complicated
-
monika
-
JAA
grab-site horrendously complicated? I guess you haven't tried to use Heritrix then.
-
Jackster
I will give that a go then first!
-
JAA
But grab-site produces WARCs, not flat files, so that might not be what you want.
-
JAA
If you don't mind sharing it, I'd still be interested in the URL even if you find a way that suits you. Open directories for old games don't last forever.
-
Jackster
Can one not extract that into normal files?
-
JAA
You can, but it's a bit of a pain.
-
monika
you can but it requires Effort™
-
monika
aa
-
Jackster
-
Jackster
I have a few dozen to go through. Ideally I am going to combine it all into a single archive instead of having multiple copies of the same files
-
monika
17.65 GB 1662 files
-
JAA
Call of Duty 4 is old? Now I feel old. lol :-)
-
JAA
Looks like at least quite a few of these are on Mod DB, FWIW.
-
Jackster
It is dying out. Trying to archive a few maps and mods that I have an interest in but might as well go fully in
-
Jackster
Most are on there. But that is more effort xD
-
JAA
Yeah, very true.
-
CreaZyp154
Seeing everything happening on Twitter I thing we should archive all new tweets like with Reddit
-
monika
that's just too much volume for archiveteam to handle
-
TheTechRobo
^
-
monika
not to mention the stupid rate limit/guest token crap
-
TheTechRobo
existing accounts/hashtags can still be run through in #archivebot
-
CreaZyp154
But how is Reddit ok then ?
-
JAA
Reddit is a *LOT* smaller than Twitter.
-
JAA
I doubt there are any recent statistics, but prior to you-know-what, Twitter had something like 500 million tweets a day.
-
CreaZyp154
Oh yeah. I forgot about that
-
JAA
Reddit only just reached 2 billion posts very recently after 17 years of operation.
-
JAA
So just a couple orders of magnitude between the two.
-
monika
just checked the latest pushshift dumps from october, there were 35.6 million posts and 237.3 million comments in that month alone
-
JAA
Sounds reasonable. So two months of Reddit are equivalent to one day of Twitter in terms of message count. Close to albeit not quite two orders of magnitude.
-
Jackster
monika That is working well now with wget. Config files are also coming through clean
-
monika
glad to hear
-
Jackster
I'd love to properly archive all the maps and mods and documentation though for public access
-
Jackster
A lot of the programming wikis and forums are long gone. Some on wayback but it is hit and miss
-
Ram
Hey, I have about 1500 websites for Ontario municipalities and civicweb (a government document portal).Can someone with creds for archivebot run them?
-
Mateon1
Just found out about this:
tomlehrersongs.com "NOTICE: THIS WEBSITE WILL BE SHUT DOWN AT SOME DATE IN THE NOT TOO DISTANT FUTURE, SO IF YOU WANT TO DOWNLOAD ANYTHING, DON’T WAIT TOO LONG."
-
JAA
Huh, they changed it.
-
JAA
It's been on Deathwatch for a while.
-
JAA
But it was originally announced to go down at the end of 2024.
-
Mateon1
Huh, I scanned my logs and didn't see any mentions of this except for a single archivebot job in Feb 2021
-
JAA
May*, and the previous one in October 2020 is when it was first mentioned.
-
Mateon1
May? I only see the message on 2021-02-06T23:21:29.000Z in this channel
-
JAA
That's when someone linked it here, yeah, but the AB jobs were in Oct 2020 and May 2021.
-
Mateon1
Ah.
-
Mateon1
Is there some place where you can check if something has been ran through archivebot?
-
JAA
The viewer
archive.fart.website/archivebot/viewer although it's not entirely reliable.
-
Mateon1
Ah, I see
-
anarcat
-
JAA
anarcat: Yes, already archived two years ago when they originally announced the shutdown for the end of 2024. :-)