00:49:23 I posted in #archiveteam that @joinmastodon was just banned from Twitter 00:49:53 It seems like any twitter account that mentions mastodon could get banned and we might have to start archiving some notable ones at this point 01:00:34 chfoo_: you around to help possibly troubleshoot some seesaw issues? 01:01:23 jamesp: just saw someone get suspended for posting a mastodon handle to elonjet with screenshots of the suspension 01:01:30 https://twitter.com/micahflee/status/1603548208063385600 01:03:27 https://web.archive.org/web/20221216003152/https://twitter.com/micahflee/status/1603548208063385600 01:06:11 I've seen reports of a lot of accounts being suspended, not sure what for: atrupar igd_news drewharwell RMac18 donie MattBinder 01:09:11 let's see if I get deleted for posting elonjet.net 01:12:18 banned search terms returning 0 results I guess: 01:12:29 https://twitter.com/search?q=url%3Agrndcntrl.net&src=spelling_expansion_revert_click 01:12:31 https://twitter.com/search?q=url%3Aelonjet.net&src=typed_query&f=live 01:18:32 https://twitter.com/MikaelThalen/status/1603558634981842945 01:20:12 * pabs sends that person to snscrape 01:39:30 more suspended: https://twitter.com/igd_news https://twitter.com/joinmastodon http://twitter.com/drewharwell https://twitter.com/donie https://twitter.com/RMac18 02:09:36 what a garbage fire 02:20:02 this post mentions more suspended folks https://infosec.exchange/@micahflee/109520648205436407 02:38:41 W7VOA now as well 02:50:04 how's the twitter data collection going? are there alternative data streams that're being used other than the trickle I'm seeing from the twitterstream collection? 03:34:16 users can be submitted in #archivebot 03:34:33 there's stuff that goes into WBM but without WARCs available 05:21:34 more banned folks https://mastodon.social/@danluu/109521316258129814 https://news.ycombinator.com/item?id=34010112 10:17:44 I hear Elon shut down Twitter Spaces after he found one with a bunch of people he suspended 10:18:07 (probably better in -ot) 12:31:20 Trying to backup some old game mods and maps from a publicly accessible apache server. One can brows the files via the Apache generated file browser. I am looking for a tool to help me backup all the files on these sites. Any suggestions? 12:32:00 Jackster: wget, grab-site, HTTrack 12:33:33 Or tell us the URL, and we can archive it into the Wayback Machine. 12:33:56 I have tried wget and HTTrack, both result in issues. The config files with the mods is not formatted, as in the text is all one line. The large files sometimes come down as .html files instead of their original file types 12:35:53 I will give grab-site a go 12:46:19 highly recommend opendirectorydownloader for this kind of stuff 12:46:49 you can feed the generated links to wget/aria2c/whatever 12:47:16 grab-site is horrendously complicated 12:47:40 https://github.com/KoalaBear84/OpenDirectoryDownloader 12:50:15 grab-site horrendously complicated? I guess you haven't tried to use Heritrix then. 12:50:33 I will give that a go then first! 12:50:43 But grab-site produces WARCs, not flat files, so that might not be what you want. 12:51:42 If you don't mind sharing it, I'd still be interested in the URL even if you find a way that suits you. Open directories for old games don't last forever. 12:51:59 Can one not extract that into normal files? 12:52:11 You can, but it's a bit of a pain. 12:52:20 you can but it requires Effort™ 12:52:22 aa 12:52:40 Here is one example http://f4hmod.site.nfoservers.com/server/ 12:53:11 I have a few dozen to go through. Ideally I am going to combine it all into a single archive instead of having multiple copies of the same files 12:53:48 17.65 GB 1662 files 12:55:50 Call of Duty 4 is old? Now I feel old. lol :-) 12:57:26 Looks like at least quite a few of these are on Mod DB, FWIW. 12:57:45 It is dying out. Trying to archive a few maps and mods that I have an interest in but might as well go fully in 12:58:05 Most are on there. But that is more effort xD 12:58:20 Yeah, very true. 13:05:32 Seeing everything happening on Twitter I thing we should archive all new tweets like with Reddit 13:08:54 that's just too much volume for archiveteam to handle 13:08:59 ^ 13:09:05 not to mention the stupid rate limit/guest token crap 13:09:10 existing accounts/hashtags can still be run through in #archivebot 13:09:42 But how is Reddit ok then ? 13:09:55 Reddit is a *LOT* smaller than Twitter. 13:10:33 I doubt there are any recent statistics, but prior to you-know-what, Twitter had something like 500 million tweets a day. 13:10:41 Oh yeah. I forgot about that 13:11:11 Reddit only just reached 2 billion posts very recently after 17 years of operation. 13:11:33 So just a couple orders of magnitude between the two. 13:28:38 just checked the latest pushshift dumps from october, there were 35.6 million posts and 237.3 million comments in that month alone 13:31:17 Sounds reasonable. So two months of Reddit are equivalent to one day of Twitter in terms of message count. Close to albeit not quite two orders of magnitude. 13:33:07 monika That is working well now with wget. Config files are also coming through clean 13:33:18 glad to hear 13:41:10 I'd love to properly archive all the maps and mods and documentation though for public access 13:41:38 A lot of the programming wikis and forums are long gone. Some on wayback but it is hit and miss 18:02:58 Hey, I have about 1500 websites for Ontario municipalities and civicweb (a government document portal).Can someone with creds for archivebot run them? 19:55:37 Just found out about this: https://tomlehrersongs.com/ "NOTICE: THIS WEBSITE WILL BE SHUT DOWN AT SOME DATE IN THE NOT TOO DISTANT FUTURE, SO IF YOU WANT TO DOWNLOAD ANYTHING, DON’T WAIT TOO LONG." 19:57:25 Huh, they changed it. 19:57:32 It's been on Deathwatch for a while. 19:57:48 But it was originally announced to go down at the end of 2024. 19:58:34 Huh, I scanned my logs and didn't see any mentions of this except for a single archivebot job in Feb 2021 20:01:33 May*, and the previous one in October 2020 is when it was first mentioned. 20:02:14 May? I only see the message on 2021-02-06T23:21:29.000Z in this channel 20:03:00 That's when someone linked it here, yeah, but the AB jobs were in Oct 2020 and May 2021. 20:03:16 Ah. 20:03:39 Is there some place where you can check if something has been ran through archivebot? 20:04:03 The viewer http://archive.fart.website/archivebot/viewer/ although it's not entirely reliable. 20:05:06 Ah, I see 22:07:08 https://tomlehrersongs.com/ is on the radar? 22:22:13 anarcat: Yes, already archived two years ago when they originally announced the shutdown for the end of 2024. :-)