-
JC|m
!help
-
JC|m
is this the room for archive bot?
-
JC|m
!archive
-
JC|m
-
pabs
JC|m: the bot is in #archivebot, but only voiced/ops have access.
-
pabs
github.com is also way too big to be archived using archivebot
-
pabs
github.com is also not the right way to archive individual GitHub repos/orgs, we have #gitgud for doing that
-
pabs
if you have any other sites to save via ArchiveBot, state the URLs here and the reasons for archiving them
-
DogsRNice_
computer, please archive the entire internet
-
h2ibot
PaulWise edited Mailman2 (+964, add more, one done):
wiki.archiveteam.org/?diff=50948&oldid=50886
-
JC|m
<pabs> "JC: the bot is in #archivebot..." <- Is there a difference between archiving here or directly from the website?
-
JC|m
-
imer
Two different systems, yes. archiveteam is not the internet archive
-
JC|m
-
JC|m
I was looking into archiving this site.
-
JC|m
imer:
-
JC|m
pabs:
-
pabs
JC|m: what is the reason for archiving subscene?
-
JC|m
They have a big directory of subtitles for movies and TV shows.
-
JC|m
The website has had some downtime recently.
-
pabs
hmm, lots of filenames indicating pirate sites.
-
JC|m
I wanted to be archived if anything happened
-
» pabs wonders what the policy is for that sort of stuff
-
JC|m
The subtitles are not pirated.
-
JC|m
A lot of famous streaming sites upload their subs on subsence
-
pabs
streaming sites like Disney+ or?
-
JC|m
nah
-
pabs
Netflix?
-
JC|m
30Nama
-
JC|m
filimo
-
JC|m
namava
-
JC|m
And independent people who translate subtitles
-
pabs
-
pabs
so time for an update I guess
-
pabs
interesting, Google deleted one of its results for this site because of
lumendatabase.org/notices/18635190
-
project10
overzealous court. SRT files (the actual subtitles) are just timestamps and text.
-
pabs
presumably the dialog/script is copyrighted though :)
-
pabs
and the movie cover/posters
-
pabs
anyway, running now
-
pabs
JC|m: see
archivebot.com if you want to follow the job
-
pabs
oh, it got a 403 error
-
pabs
JC|m: no dice, all my attempts got 403 errors for the front page
-
pabs
maybe something for qwarc from JAA
-
JC|m
pabs: is it the robots.txt?
-
pabs
no, AB ignores robots.txt, the first request AB sent (for the front page) got a 403 error
-
pabs
same goes for the subdomains, including the forum
-
JC|m
Do you guys also archive forums, right?
-
sepro
I archived all of subscene last year. Though not in the warc format, but just an archive of all subtitle files.
-
sepro
The main problem was URL discovery, as there is no easy way to get a list of all shows. Also had some problems with cloudflare.
-
JC|m
Is v3.2 The latest version of Warrior?
-
JC|m
from 2021
-
project10
-
project10
if you're familiar with docker, you can also run a container and get the latest that way. I think the VM images would keep themselves up to date, but I've never used them
-
project10
there is a dedicated #warrior channel if you need support for either option
-
pabs
JC|m: yeah, we often safe forums
-
pabs
er save forums
-
JC|m
-
JC|m
-
JC|m
Some forums that you may want to archive
-
Peroniko
I tried to archive forum.cdm.me before, but there any many ignores that I had to apply
-
Peroniko
All the pages with this were behind the login screen: ^(?:(?!private\.php\?|register\.php\?|sendmessage\.php\?|itrader_feedback\.php\?|newreply\.php\?|usercp\.php\?|subscription\.php\?).)*$
-
DigitalDragons
seems like a very aggressive cloudflare configuration on subscene
-
Exorcism
-
eggdrop
-
pabs
hmm, I feel like the AB websocket is not passing on all URL requests/responses
-
JAA
pabs: There's a known bug near the end of jobs, where the last couple lines might get swallowed. Other than that, it should only drop messages when it can't keep up. It currently looks like there are two clients that are too slow and get messages dropped regularly.
-
pabs
hhmm, I have two non-browser clients attached, that must be me
-
pabs
JAA: an example: I just redid upload.systems, but the job doesn't show up at all in the browser, id y6rqls8zrwd2r7hc1fa2znok
-
justhere66
hi
-
justhere66
could I have this whole website archived on the internet archive? It's a very small website.
-
justhere66
-
justhere66
-
nulldata
PinstripedProspects.com, a blog covering the New York Yankees minor league system, has announced it's shutting down in 2 weeks.
pinstripedprospects.com/pinstriped-…ospects-website-shutting-down-65038
-
h2ibot
PaulWise edited Software Heritage (+424, add some more info and related projects):
wiki.archiveteam.org/?diff=50949&oldid=28671
-
h2ibot
TheTechRobo edited URLTeam (-564, Improve tiny.cc entry):
wiki.archiveteam.org/?diff=50950&oldid=50889
-
h2ibot
TheTechRobo edited URLTeam (+19, Add another t.ly link):
wiki.archiveteam.org/?diff=50951&oldid=50950
-
h2ibot
TheTechRobo edited URLTeam (+44, T.ly is non-incremental):
wiki.archiveteam.org/?diff=50952&oldid=50951
-
h2ibot
PaulWise created Trac (+2072, create Trac project page):
wiki.archiveteam.org/?title=Trac
-
h2ibot
PaulWise edited Bugzilla (+0, redhat bugzilla crashed):
wiki.archiveteam.org/?diff=50954&oldid=50756
-
h2ibot
-
h2ibot
PaulWise edited GitHub (+19, Category:Code):
wiki.archiveteam.org/?diff=50956&oldid=50737
-
h2ibot
PaulWise edited IRC/Logs (+31, wordpress logs):
wiki.archiveteam.org/?diff=50957&oldid=50917
-
audrooku|m
what's the forecast for AT spinning back up? is it a storage or cpu/net saturation issue on IA's end?
-
imer
audrooku|m: "soon" I believe. #shreddit is going straight to IA and #// is due to restart (in some capacity) as well I think?
-
imer
from #archiveteam "<ark_iver>: The problems at IA that prevented us from uploading large amounts of data are getting better. We will now start uploading (part of) the offloaded data to IA, and probably resume projects after. The situation is not completely 'back to normal' yet, but will likely be in about a month."
-
audrooku|m
that's good to hear, thanks for answering my question as I missed that message
-
imer
no worries
-
JAA
One week after the supposed shutdown, the Canucks forum is still going. I'm running a continuous thing that fetches new posts as they're being made until it does shut down.
-
JAA
Also, for the record, the community-chosen successor seems to be
canucksfanforum.com (which somehow already has 35k posts since mid-Sept).
-
pokechu22
JAA: did you see mountainbladder's message about TaleWorlds whitelisting AB pipeline IPs?
-
pokechu22
it was a few days ago I think
-
JAA
pokechu22: Yes, I replied as well, just didn't have time to act on it yet.
-
fireonlive
blast from the past: Archiverse – Archive Team's dump of Nintendo's Miiverse (2012–2017):
archiverse.guide :)