00:00:17 Would it eventually grab all them or not tho? 00:02:21 Finland also redirects to .de, wut? 00:02:36 Maybe it detects Finnish Hetzner IPs as German. 00:02:43 In Estonia, got redirected to https://www.southparkstudios.com/forum/index.php 00:02:49 UK: https://www.southparkstudios.co.uk/forum/index.php 00:02:59 US gives https://southpark.cc.com/forum/index.php as-is 00:03:03 NL: https://www.southparkstudios.com/forum/index.php 00:03:19 who thought this was a good idea /o\ 00:03:58 NZ: https://www.southparkstudios.com/forum/index.php 00:04:15 AR: https://www.southpark.lat/forum/index.php 00:04:38 yet the forum contents seem generic/international? 00:05:36 Yeah, same content everywhere it seems. 00:05:37 DigitalOcean NYC: no redirect 00:06:05 Maggie's hand-saving the fic but wanted to pass it on. Thanks for looking at it! 00:06:09 Also, given the most recent posts, they seem to have given up completely on fighting spam. 00:10:03 JAA: https://twitter.com/ryanqnorth/status/1433861047404961825 00:10:03 nitter: https://nitter.net/ryanqnorth/status/1433861047404961825 00:12:19 Yup 00:33:10 DigitalDragons: ahh ok, so no action needed :) 00:34:46 oh wow, the southpark forum is dying :o 00:34:54 >Join Our Discord 00:34:57 fucking kill me 00:35:08 canada: https://www.southparkstudios.com/forum/index.php 00:35:18 oh JAA covered that, ignore me 00:40:16 Is it possible to save it through AB? oo; 00:40:42 Under one of the other domains, yes. Not under southpark.cc.com because we don't currently have a pipeline in the US. 00:41:00 Well, actually, no, because AB is much too slow to archive it in time, but you get the idea. 00:45:24 I've started qwarc from a machine in the US, and it should take around 10 hours. 00:46:05 Grabbing only the topic pages, as usual. 00:48:15 thanks JAA :) 01:00:29 Maybe longer, site is still getting slower. :-| 01:00:42 :( 01:36:43 I guess maybe I should've gone with sequential topic IDs on this one rather than random order since most of the recent topics are probably spam. 01:36:52 Too late to fix that now though. 02:08:24 :C 02:50:29 The Hobbes OS/2 archive is going down forever in April. https://hobbes.nmsu.edu/ 03:14:47 LinkTree acquired Koji, which is shutting down on the 31st https://www.prnewswire.com/news-releases/linktree-acquires-koji-302015100.html 03:17:55 Hello! I don't have permissions in the archivebot channel so am dropping in here to see if I can get guidance/assistance :) 03:20:40 I'm aiming to archive the site Room Escape Artist (http://roomescapeartist.com), it has about 5,000 articles/pages. It's part of a larger effort I'm organizing to archive pages and materials related to the genre of immersive art. In this case, REA is the sole documentation for a lot of these experiences, many of which have since disappeared 03:21:57 Tech234a edited Deathwatch (+177, /* 2024 */ Koji): https://wiki.archiveteam.org/?diff=51479&oldid=51478 03:23:39 I've been in contact with the site owner and they're willing to add code to the site if that's necessary for archiving purposes 03:26:43 that looks like a straightforward wordpress blog, no weird javascript stuff 03:27:40 JAA: we can probably throw the homepage into archivebot and let it crawl 03:39:16 Yep, started. 04:15:41 Rad, thank you! 04:22:42 Laura-CFIA: a friend *makes* escape rooms but I don't feel like I could write a review with this quality 04:25:19 nicolas17 Yeah, haha! They're the best around, been doing it since escape rooms started around 2014 (so 10 years now, kind of amazing) 04:27:42 I have a general question, also... there are several other sites I'd love to add to the archive eventually, is it easiest to just come in here and make the request? I'm semi-comfortable with IRC commands but I don't want to mess anything up 04:28:20 yeah 04:28:34 for some specific websites we have specific channels and specialized tooling 04:29:06 anyone can go to #imgone and run "!a https://i.imgur.com/0NjLWyR.jpg" to archive something from imgur 04:29:09 Great, thanks! Is there any kind of guideline on how often a site should be crawled? For example with REA and some of these other ones, they're posting multiple articles a day 04:29:13 Ooh, interesting! 04:29:53 and it won't just get that one URL, it will extract the image ID 0njLWyR and get the webpage, image, and some other stuff 04:31:02 #archivebot for generic archival is restricted, only users with +o or +v permissions can add stuff, but just ask and someone can add it for you or tell you why not 04:31:56 and if you're going to stick around you should probably get a real IRC client instead of using the webchat ;) 04:38:15 Hahah, I can do that :) Thank you! 04:51:11 Though !tell bot kinda makes webchat more useful than it used to be 04:52:57 eggdrop++ 04:52:57 -eggdrop- [karma] 'eggdrop' now has 6 karma! 04:55:59 :D 09:50:47 nicolas17: just saw https://hackint.logs.kiska.pw/archiveteam-bs/20240103#c399918 - always feel free to get this data and upload it 09:50:52 especially in these interesting cases 14:42:42 My qwarc grab of southpark.cc.com finished some hours ago and seems to have successfully grabbed almost everything. There are a couple ancient broken topics that return an error page, but otherwise, I didn't see any significant problems. 14:49:16 182775 topics, 218971 topic pages retrieved according to my log. That's about 10k short of the counter on the homepage, but that's not unexpected. There were some login-required topics, though I haven't looked into whether there are areas of the forums accessible by anyone with an account. 14:51:28 I'm also running an update thingy that will continue to grab new posts every few minutes until the site goes down. Although there's little of value there; it's all spam. 15:03:21 797732 posts in those topic pages vs ~870k per the homepage. 16:21:39 ^_^ 16:22:04 > Google is shutting down websites built with Google Business Profiles in March 2024. (via #archiveteam) sheesh lol 16:22:47 Nulldata edited Deathwatch (+274, /* 2024 */ Added Google Business Profile Websites): https://wiki.archiveteam.org/?diff=51480&oldid=51479 16:23:15 i’ve seen business.site in use but not negocio.site before, must be a regional thing 16:23:55 Death eventually comes for ~~all of us~~ every Google property. 16:25:47 true! 16:57:56 YetAnotherArchiver edited The WARC Ecosystem (+751, Create a wikitable for deprecated tools): https://wiki.archiveteam.org/?diff=51481&oldid=51454 16:57:57 Ufarwisan edited Discord (+131, update): https://wiki.archiveteam.org/?diff=51482&oldid=50931 16:57:58 Ufarwisan edited Pastebin (-346, the wayback machine has begun to ignore the…): https://wiki.archiveteam.org/?diff=51483&oldid=51460 16:57:59 Ufarwisan edited Matrix (+92, /* Archival tools */): https://wiki.archiveteam.org/?diff=51484&oldid=46312 16:58:00 RealPerson edited List of website hosts (+42, added https://www.000webhost.com/): https://wiki.archiveteam.org/?diff=51485&oldid=51453 17:13:41 ahh... 000webhost.... 17:18:11 arkiver: that iOS beta has been archived via archivebot 17:18:24 it took really long 17:21:17 I know someone who has an archive of ~all iOS builds (including some that Apple has since deleted), it's like 50TB... 17:31:51 arkiver: I'd like some advice on the samsung open source thing, but we seem to have non-overlapping activity times on IRC :P 18:07:48 (or your pings are still broken) 18:58:35 https://www.icc-cpi.int/streaming-all-displays - streams of ICC, I guess these videos can be saved as radio recordings are saved by the IA 19:18:00 The spam on the South Park forums seems to have started on 2023-06-14 or so. Initially only a few topics daily. 19:18:22 I'm beginning to get a shutdown message randomly. 19:24:59 Now it's solidly the shutdown message. 19:25:50 There were about 96k topic IDs before the spam began, and there were about 265k topic IDs just before the shutdown. 19:26:17 So just over a third of all topics are not spam... 19:29:00 finding spam on the internet is like finding hay in a haystack 19:31:03 Usually, it gets deleted though. They clearly didn't give a shit for half a year, then decided to shut the forums down instead. 19:31:42 bet they laid off the moderators 19:35:35 >Howdy Ho, South Park fans! The South Park Forums might be closed, but fear not, our bond’s as solid as Cartman’s love for Cheesy Poofs! Join us (@SouthPark) on our social channels for news, updates and more. 19:35:38 lol 19:35:59 hot take, moderating a forum is easier than moderating a discord 19:36:07 i like how the of that page is "Social Media Layout" 19:37:02 <fireonlive> i'd believe that, a little less 'real-time' perhaps? 19:37:55 <fireonlive> https://images.paramount.tech/path/mgid:file:gsp:entertainment-assets:/sps/shared/forum/BoysWaving-800px.png < interesting url for the bye image 19:38:03 <fireonlive> mgid, etc 19:40:41 <fireonlive> (anyone know what that is?) 19:43:00 <JAA> Interesting, avatars still work but there's no geo-IP redirect on those URLs. 19:43:31 <fireonlive> oh huh 19:49:39 <JAA> Oh, there were attachments. 19:50:53 <JAA> Those pages still work, but the attachments seem to be gone. Maybe that's a relic from the ancient times. 19:51:32 <JAA> > <p>The selected attachment does not exist anymore.</p> 19:51:44 <JAA> E.g. https://southpark.cc.com/forum/download/file.php?id=1358 (highest ID) 20:05:29 <JAA> A lot of the avatars are 404s, actually. Either they're deleting them right now, or they were already broken, can't tell. 20:05:36 <JAA> I'm grabbing whatever's left though. 20:19:28 <fireonlive> :) 20:34:44 <thuban> nice work! 20:37:14 <fireonlive> JAA++ 20:37:15 -eggdrop- [karma] 'JAA' now has 11 karma! 20:37:32 <fireonlive> Paramount-- 20:37:32 -eggdrop- [karma] 'Paramount' now has -1 karma! 20:39:08 <JAA> :-) 20:45:27 <JAA> 5.00G/5.00G [01:22<00:00, 65.2MiB/s] 20:45:32 <JAA> Nice upload speed :-) 20:49:16 <fireonlive> =] 21:16:28 <JAA> Turns out that those attachments I saw were only introduced in 2022: https://web.archive.org/web/20240106091745/https://southpark.cc.com/forum/viewtopic.php?f=2&t=94997 21:16:52 <fireonlive> ahh 21:18:01 <JAA> There are three attachments in the WBM, all captured about a year ago: https://web.archive.org/web/*/https://southpark.cc.com/forum/download/file.php* 21:19:07 <JAA> So it broke sometime in the past 11 months or so, I guess. 21:19:23 <JAA> Or I was just too slow today. 22:07:38 <nulldata> fireonlive - RE the URL, it's probably from the DAM Paramount is using - likely https://www.opentext.com/products/media-management 22:08:18 <fireonlive> ohh interesting 22:09:04 <fireonlive> custom URI schemes for everything is neat :) 22:18:29 <fireonlive> found this weird non-redirecting subdomain but seems the same story for files: https://forums.southpark.cc.com/forum/download/file.php?id=34 22:19:05 <fireonlive> the api.php actually exposes a 'direct from mediawiki' error instead of 'covering it up' with a generic page https://forums.southpark.cc.com/w/api.php 22:19:18 <fireonlive> (also, seemingly no geo-redirect) 22:20:36 <fireonlive> see also: https://forums.southpark.cc.com/wiki/Special:RecentChanges vs https://southpark.cc.com/wiki/Special:RecentChanges 22:31:56 <JAA> Yeah, I saw that subdomain earlier (wiki page creation with all the various forum URLs soon). The avatars are also served from that domain. 22:32:18 <JAA> Interesting that it serves the wiki, too. 22:33:02 <fireonlive> ahh 22:33:23 <fireonlive> indeed hm 22:38:45 <DogsRNice_> https://twitter.com/JoeyCheerio/status/1745143832881098845 22:38:45 <eggdrop> nitter: https://nitter.net/JoeyCheerio/status/1745143832881098845 22:38:47 <DogsRNice_> https://twitter.com/JoeyCheerio/status/1745150271230038228 22:40:08 <fireonlive> sheesh, what's with the DMCA takedowns on REing lately lol 22:40:36 <DogsRNice_> especally with valve, they dont do that often 22:41:33 <DogsRNice_> did anyone archive the portal 64 rom patcher? 22:44:36 <pedatic-darwin> greetings 22:44:47 <pedatic-darwin> i was forwarded here by https://findyoutubevideo.thetechrobo.ca/ 22:45:08 <pedatic-darwin> how do i go about requesting a deleted youtube video 22:45:16 <TheTechRobo> you are probably looking for #youtubearchive, not #archiveteam-bs 22:45:18 <TheTechRobo> /join #youtubearchive 22:45:24 <pedatic-darwin> thank you, my mistake 23:12:07 <h2ibot> JustAnotherArchivist created South Park Forums (+2096, Created page with "{{Infobox project | URL =…): https://wiki.archiveteam.org/?title=South%20Park%20Forums 23:13:07 <h2ibot> JustAnotherArchivist edited Deathwatch (+36, /* 2024 */ Add South Park Forums): https://wiki.archiveteam.org/?diff=51487&oldid=51480