-
h2ibot
Arkiver edited YouTube (+86, Clarification on scope and newsworthy channels.):
wiki.archiveteam.org/?diff=51140&oldid=51133
-
h2ibot
-
arkiver
let
-
arkiver
let's use #frogger for blogger
-
h2ibot
JustAnotherArchivist edited Blogger (-45, EFnet #frogger is dead, long live hackint #frogger):
wiki.archiveteam.org/?diff=51142&oldid=51128
-
Pedrosso
Are there any plans on restarting the furaffinity archival project? Due to its large amount of new content after the 2015 archive
-
JAA
TIL there was a 2015 project.
-
» Pedrosso looks up if it was 2012
-
JAA
No, I mean, I had no idea we ever did anything with it.
-
Pedrosso
Ahhh
-
Pedrosso
It was 2015 btw
-
Pedrosso
I'm gonna assume that the answer to my question then is a hard no, haha
-
JAA
There was definitely no project since 2017. So probably correct.
-
JAA
I'm not sure it'll happen soon (with the Google stuff and lots of end-of-year shutdowns still to be announced as usual), but I'm all for it.
-
Pedrosso
I see, the enumeration is quite nice.
-
Pedrosso
Also; so that's why things are getting so busy now
-
h2ibot
-
h2ibot
PaulWise edited Mailman2 (+3016, more lists done and to do):
wiki.archiveteam.org/?diff=51144&oldid=50963
-
pabs
-rss/#hackernews- RIP Google Groups Dejanews.com Archive:
dejanews.com news.ycombinator.com/item?id=38238796
-
h2ibot
PaulWise edited Mailman2 (-12, distorted done, claws mail in progress):
wiki.archiveteam.org/?diff=51145&oldid=51144
-
h2ibot
JustAnotherArchivist edited YouTube (+627, Datetimeify):
wiki.archiveteam.org/?diff=51146&oldid=51140
-
fireonlive
<3
-
JAA
:-)
-
piennu66
Composer of a major Japanese indie unit (
ja.wikipedia.org/wiki/ツユ) unilaterally declares that they are extremely done with everything:
nitter.net/Pusu_kun. Are Twitter accounts still archivable? (if yes, then someone please feed it to the archiver; I won't wait around to see the answer.)
-
piennu66
(god are all those join/leave messages spamming the channel? im so sorry, pidgin won't open a chat window)
-
yperion
Hi, i got an 'Abmahnung' (written warning from a lawyer) for one of the IPs i use for my ArchiveTeam Warriors (on auto-mode). They accuse me of sharing copyrighted porn via BitTorrent. Is there any way that one of the ATWarrior projects could have used BitTorrent?
-
yperion
(The IPs i use are geolocated in Germany)
-
fireonlive
99.9% sure we do nothing with bittorrent; but feel free to stick around for an official answer
-
fireonlive
cc arkiver
-
yperion
99,9% is probably enough for me : ) I did figured that there would most prly be no p2p in ATWarrior projects
-
JAA
Correct, there is not.
-
betamax
How does AB handle PHP links with GET parameters? will it just consider them as normal URLs?
-
betamax
(want to put this website in:
telford-electronics.co.uk/index.php - the owner sadly passed away earlier this year)
-
betamax
but the site is basically all done via PHP GET parameters
-
betamax
-
betamax
-
JAA
betamax: Query strings are kept as is for the most part. Some session ID parameters get removed to avoid looping.
-
JAA
As long as they're links, not <form method="GET">, it should work fine.
-
betamax
great, thanks
-
meta
Hello. I found a German site that might shut down do to legal issues. Is that something someone here can help me with?
-
meta
the site is relatively small so if I understood it correctly I should ask in the ArchiveBot chat?
-
meta
sadly I do not know how close a site needs to be to shutdown befor it becomes an archiveteam thing
-
meta
-
meta
they provide ruleings from german courts for their users
-
JAA
meta: I tried to archive openJur with ArchiveBot a few weeks ago. It got banned fairly quickly. Also, it's quite large, actually.
-
meta
is there another way to save it?
-
JAA
Just checked, the IP is still banned.
-
JAA
<p class="text-justify">Sie können openJur von dieser IP-Adresse leider nicht aufrufen. Dies ist das Resultat von missbräuchlichen Zugriffen und dadurch Verstößen gegen unsere Nutzungsbedingungen aus dem von Ihnen genutzten IP-Bereich bzw. im Netz des von Ihnen genutzen Internetanbieters. Bei Fragen zur Sperre wenden Sie sich bitte an abuse⊙od</p>
-
meta
still banned
-
JAA
Their terms are pretty strict about not allowing anything other than personal usage, search indexing, and metadata collection, possibly for legal reasons. So asking them for an exception might not be useful.
-
meta
so they dug themself in
-
meta
but archievel is actually alligned to their purpose
-
meta
so asking might actuall get something useful
-
JAA
Perhaps
-
JAA
That ban is about a month old, by the way.
-
meta
I think the strict restrictions are more about avoiding commercial use.
-
meta
Also if your adress is fixed they might just think you wanted to DOS them.
-
JAA
The request rate was nowhere near what would be needed for a DoS though.
-
JAA
And the crawl had an unambiguous user agent identifying it as 'ArchiveTeam ArchiveBot/...'.
-
meta
So, they just check number of requests against the IP?
-
meta
I do not think a human made that decission.
-
meta
Well, if you already tried, there is nothing more that can be done. Thank you for your time and effort.
-
meta
I must go now. Thanks for speaking with me.
-
JAA
I'll try again. It's a very important resource.
-
JAA
Just not sure how yet. :-)
-
meta
I wish you good luck. Perhaps I will one day lend my warrior to the effort. I must really go now, so once again, thank you.
-
JAA
I'm not sure whether the ban was automated or manual. If it was automated, the limit seemed quite arbitrary. But the time (17:12 UTC, so 19:12 CEST) makes a manual ban somewhat unlikely, too.
-
Pedrosso
^ Also up to lend a warrior
-
Pedrosso
I talked about this earlier; now that the
sporepedia2.foroactivo.com archive is done I'm wondering about how to get the failed imgur saves from the logs to send to #imgone . But I don't know how to get the logs (assuming the logs are saved)
-
JAA
Pedrosso: The log gets stored in the job's *-meta.warc.gz file.
-
Pedrosso
Ahhh, good. Is it directly under the zip or do I need to interpret some .warc shenannigans?
-
JAA
It's a WARC file. But you can just `zstdgrep` it or similar.
-
JAA
(The zstd binaries support reading gzipped data, and they're much faster in my experience, likely due to different buffering.)
-
Pedrosso
ah, alright.
-
JAA
Also, gzip has nothing to do with ZIP.
-
Pedrosso
Yea no I got that, just don't have any other terminology for it
-
JAA
The important thing is that it's just a plain record. There's no HTTP nonsense like chunked transfer encoding on top.
-
Pedrosso
Thank you
-
JAA
Grabbing He-Man.org with qwarc, thread pages should be done in 5-ish hours. Hopefully, it doesn't go down at midnight UTC.
-
thuban
that already went through archivebot, right? was there a problem with that job or is this just belt-and-suspenders?
-
JAA
It's running in AB, but it won't finish anytime soon.
-
JAA
Looks like the AB job actually got more or less as much as it could. It discovered around 94k threads. Homepage says almost 132k. Some threads require an account, possibly one with access permission, too.
-
thuban
ah. well, thanks for calling in the cavalry
-
JAA
:-)
-
nulldata
youtube.com/watch?v=QItBdql_8FI <- The Completionist and his charity and livestream event are accused if not donating funds received. Might be good to back up everything. Looks like someone already started an AB job for the Open Hand site specifically.
-
nulldata
-
nicolas17
videos already added in #down-the-tube
-
JAA
Oh yeah, forum attachments on He-Man.org are all behind a login wall, and registration is closed, too.
-
nicolas17
-
nicolas17
(I didn't try it, but 9 years old + 14% success rate = probably banned)
-
JAA
Surprise, surprise, it does not.
-
JAA
(work)