-
flashfire42|m
Would it eventually grab all them or not tho?
-
JAA
Finland also redirects to .de, wut?
-
JAA
Maybe it detects Finnish Hetzner IPs as German.
-
qwertyasdfuiopghjkl
-
JAA
-
pokechu22
-
JAA
-
nicolas17
who thought this was a good idea /o\
-
JAA
-
nicolas17
-
nicolas17
yet the forum contents seem generic/international?
-
JAA
Yeah, same content everywhere it seems.
-
nicolas17
DigitalOcean NYC: no redirect
-
Doranwen
Maggie's hand-saving the fic but wanted to pass it on. Thanks for looking at it!
-
JAA
Also, given the most recent posts, they seem to have given up completely on fighting spam.
-
nicolas17
-
eggdrop
-
JAA
Yup
-
fireonlive
DigitalDragons: ahh ok, so no action needed :)
-
fireonlive
oh wow, the southpark forum is dying :o
-
fireonlive
>Join Our Discord
-
fireonlive
fucking kill me
-
fireonlive
-
fireonlive
oh JAA covered that, ignore me
-
Ryz
Is it possible to save it through AB? oo;
-
JAA
Under one of the other domains, yes. Not under southpark.cc.com because we don't currently have a pipeline in the US.
-
JAA
Well, actually, no, because AB is much too slow to archive it in time, but you get the idea.
-
JAA
I've started qwarc from a machine in the US, and it should take around 10 hours.
-
JAA
Grabbing only the topic pages, as usual.
-
fireonlive
thanks JAA :)
-
JAA
Maybe longer, site is still getting slower. :-|
-
fireonlive
:(
-
JAA
I guess maybe I should've gone with sequential topic IDs on this one rather than random order since most of the recent topics are probably spam.
-
JAA
Too late to fix that now though.
-
Ryz
:C
-
test
The Hobbes OS/2 archive is going down forever in April.
hobbes.nmsu.edu
-
tech234a
-
Laura-CFIA
Hello! I don't have permissions in the archivebot channel so am dropping in here to see if I can get guidance/assistance :)
-
Laura-CFIA
I'm aiming to archive the site Room Escape Artist (
roomescapeartist.com), it has about 5,000 articles/pages. It's part of a larger effort I'm organizing to archive pages and materials related to the genre of immersive art. In this case, REA is the sole documentation for a lot of these experiences, many of which have since disappeared
-
h2ibot
Tech234a edited Deathwatch (+177, /* 2024 */ Koji):
wiki.archiveteam.org/?diff=51479&oldid=51478
-
Laura-CFIA
I've been in contact with the site owner and they're willing to add code to the site if that's necessary for archiving purposes
-
nicolas17
that looks like a straightforward wordpress blog, no weird javascript stuff
-
nicolas17
JAA: we can probably throw the homepage into archivebot and let it crawl
-
JAA
Yep, started.
-
Laura-CFIA
Rad, thank you!
-
nicolas17
Laura-CFIA: a friend *makes* escape rooms but I don't feel like I could write a review with this quality
-
Laura-CFIA
nicolas17 Yeah, haha! They're the best around, been doing it since escape rooms started around 2014 (so 10 years now, kind of amazing)
-
Laura-CFIA
I have a general question, also... there are several other sites I'd love to add to the archive eventually, is it easiest to just come in here and make the request? I'm semi-comfortable with IRC commands but I don't want to mess anything up
-
nicolas17
yeah
-
nicolas17
for some specific websites we have specific channels and specialized tooling
-
nicolas17
anyone can go to #imgone and run "!a
i.imgur.com/0NjLWyR.jpg" to archive something from imgur
-
Laura-CFIA
Great, thanks! Is there any kind of guideline on how often a site should be crawled? For example with REA and some of these other ones, they're posting multiple articles a day
-
Laura-CFIA
Ooh, interesting!
-
nicolas17
and it won't just get that one URL, it will extract the image ID 0njLWyR and get the webpage, image, and some other stuff
-
nicolas17
#archivebot for generic archival is restricted, only users with +o or +v permissions can add stuff, but just ask and someone can add it for you or tell you why not
-
nicolas17
and if you're going to stick around you should probably get a real IRC client instead of using the webchat ;)
-
Laura-CFIA
Hahah, I can do that :) Thank you!
-
OrIdow6^2
Though !tell bot kinda makes webchat more useful than it used to be
-
project10
eggdrop++
-
eggdrop
[karma] 'eggdrop' now has 6 karma!
-
fireonlive
:D
-
arkiver
nicolas17: just saw
hackint.logs.kiska.pw/archiveteam-bs/20240103#c399918 - always feel free to get this data and upload it
-
arkiver
especially in these interesting cases
-
JAA
My qwarc grab of southpark.cc.com finished some hours ago and seems to have successfully grabbed almost everything. There are a couple ancient broken topics that return an error page, but otherwise, I didn't see any significant problems.
-
JAA
182775 topics, 218971 topic pages retrieved according to my log. That's about 10k short of the counter on the homepage, but that's not unexpected. There were some login-required topics, though I haven't looked into whether there are areas of the forums accessible by anyone with an account.
-
JAA
I'm also running an update thingy that will continue to grab new posts every few minutes until the site goes down. Although there's little of value there; it's all spam.
-
JAA
797732 posts in those topic pages vs ~870k per the homepage.
-
fireonlive
^_^
-
fireonlive
> Google is shutting down websites built with Google Business Profiles in March 2024. (via #archiveteam) sheesh lol
-
h2ibot
Nulldata edited Deathwatch (+274, /* 2024 */ Added Google Business Profile Websites):
wiki.archiveteam.org/?diff=51480&oldid=51479
-
fireonlive
i’ve seen business.site in use but not negocio.site before, must be a regional thing
-
nulldata
Death eventually comes for ~~all of us~~ every Google property.
-
fireonlive
true!
-
h2ibot
YetAnotherArchiver edited The WARC Ecosystem (+751, Create a wikitable for deprecated tools):
wiki.archiveteam.org/?diff=51481&oldid=51454
-
h2ibot
-
h2ibot
Ufarwisan edited Pastebin (-346, the wayback machine has begun to ignore the…):
wiki.archiveteam.org/?diff=51483&oldid=51460
-
h2ibot
Ufarwisan edited Matrix (+92, /* Archival tools */):
wiki.archiveteam.org/?diff=51484&oldid=46312
-
h2ibot
-
fireonlive
ahh... 000webhost....
-
nicolas17
arkiver: that iOS beta has been archived via archivebot
-
nicolas17
it took really long
-
nicolas17
I know someone who has an archive of ~all iOS builds (including some that Apple has since deleted), it's like 50TB...
-
nicolas17
arkiver: I'd like some advice on the samsung open source thing, but we seem to have non-overlapping activity times on IRC :P
-
nicolas17
(or your pings are still broken)
-
Gooshka
icc-cpi.int/streaming-all-displays - streams of ICC, I guess these videos can be saved as radio recordings are saved by the IA
-
JAA
The spam on the South Park forums seems to have started on 2023-06-14 or so. Initially only a few topics daily.
-
JAA
I'm beginning to get a shutdown message randomly.
-
JAA
Now it's solidly the shutdown message.
-
JAA
There were about 96k topic IDs before the spam began, and there were about 265k topic IDs just before the shutdown.
-
JAA
So just over a third of all topics are not spam...
-
nicolas17
finding spam on the internet is like finding hay in a haystack
-
JAA
Usually, it gets deleted though. They clearly didn't give a shit for half a year, then decided to shut the forums down instead.
-
nicolas17
bet they laid off the moderators
-
fireonlive
>Howdy Ho, South Park fans! The South Park Forums might be closed, but fear not, our bond’s as solid as Cartman’s love for Cheesy Poofs! Join us (@SouthPark) on our social channels for news, updates and more.
-
fireonlive
lol
-
nicolas17
hot take, moderating a forum is easier than moderating a discord
-
fireonlive
i like how the <title> of that page is "Social Media Layout"
-
fireonlive
i'd believe that, a little less 'real-time' perhaps?
-
fireonlive
-
fireonlive
mgid, etc
-
fireonlive
(anyone know what that is?)
-
JAA
Interesting, avatars still work but there's no geo-IP redirect on those URLs.
-
fireonlive
oh huh
-
JAA
Oh, there were attachments.
-
JAA
Those pages still work, but the attachments seem to be gone. Maybe that's a relic from the ancient times.
-
JAA
> <p>The selected attachment does not exist anymore.</p>
-
JAA
-
JAA
A lot of the avatars are 404s, actually. Either they're deleting them right now, or they were already broken, can't tell.
-
JAA
I'm grabbing whatever's left though.
-
fireonlive
:)
-
thuban
nice work!
-
fireonlive
JAA++
-
eggdrop
[karma] 'JAA' now has 11 karma!
-
fireonlive
Paramount--
-
eggdrop
[karma] 'Paramount' now has -1 karma!
-
JAA
:-)
-
JAA
5.00G/5.00G [01:22<00:00, 65.2MiB/s]
-
JAA
Nice upload speed :-)
-
fireonlive
=]
-
JAA
-
fireonlive
ahh
-
JAA
There are three attachments in the WBM, all captured about a year ago:
web.archive.org/web/*/https://south…ark.cc.com/forum/download/file.php*
-
JAA
So it broke sometime in the past 11 months or so, I guess.
-
JAA
Or I was just too slow today.
-
nulldata
fireonlive - RE the URL, it's probably from the DAM Paramount is using - likely
opentext.com/products/media-management
-
fireonlive
ohh interesting
-
fireonlive
custom URI schemes for everything is neat :)
-
fireonlive
found this weird non-redirecting subdomain but seems the same story for files:
forums.southpark.cc.com/forum/download/file.php?id=34
-
fireonlive
the api.php actually exposes a 'direct from mediawiki' error instead of 'covering it up' with a generic page
forums.southpark.cc.com/w/api.php
-
fireonlive
(also, seemingly no geo-redirect)
-
fireonlive
-
JAA
Yeah, I saw that subdomain earlier (wiki page creation with all the various forum URLs soon). The avatars are also served from that domain.
-
JAA
Interesting that it serves the wiki, too.
-
fireonlive
ahh
-
fireonlive
indeed hm
-
DogsRNice_
-
eggdrop
-
DogsRNice_
-
fireonlive
sheesh, what's with the DMCA takedowns on REing lately lol
-
DogsRNice_
especally with valve, they dont do that often
-
DogsRNice_
did anyone archive the portal 64 rom patcher?
-
pedatic-darwin
greetings
-
pedatic-darwin
-
pedatic-darwin
how do i go about requesting a deleted youtube video
-
TheTechRobo
you are probably looking for #youtubearchive, not #archiveteam-bs
-
TheTechRobo
/join #youtubearchive
-
pedatic-darwin
thank you, my mistake
-
h2ibot
JustAnotherArchivist created South Park Forums (+2096, Created page with "{{Infobox project | URL =…):
wiki.archiveteam.org/?title=South%20Park%20Forums
-
h2ibot
JustAnotherArchivist edited Deathwatch (+36, /* 2024 */ Add South Park Forums):
wiki.archiveteam.org/?diff=51487&oldid=51480