-
pabs
fireonlive: add to Deathwatch?
-
h2ibot
PaulWise edited Site exploration (+104, add bing-scrape link (by JAA)):
wiki.archiveteam.org/?diff=50868&oldid=50594
-
h2ibot
Yts98 edited ZOWA (+50, Mark as in progress):
wiki.archiveteam.org/?diff=50869&oldid=50844
-
h2ibot
Yts98 edited Current Projects (+0, move ZOWA to current):
wiki.archiveteam.org/?diff=50870&oldid=50777
-
h2ibot
PaulWise created ArchiveBot/Ignore/NonSequentialIntegers (+5986, add a way to ignore non-sequential integer…):
wiki.archiveteam.org/?title=ArchiveBot/Ignore/NonSequentialIntegers
-
h2ibot
PaulWise edited MoinMoin (+157, link to the non-sequential integer ignores page):
wiki.archiveteam.org/?diff=50872&oldid=50783
-
h2ibot
JustAnotherArchivist edited Miraheze (+29, Datetimeify):
wiki.archiveteam.org/?diff=50873&oldid=50859
-
ssss
How many warriors/containers should i run if i have a connection of 50mbits/10mbits?
-
imer
ssss: usually you're limited by the sites we're archiving, so it depends on that and what project you're running, some projects are more bandwidth hungry than others
-
qwertyasdfuiopghjkl
I disagree with the recent edits to
wiki.archiveteam.org/index.php/Template:Rescued making it show only the earliest year of archival and hiding the years of later archival projects behind a "and more" link (that also just scrolls down to the list of categories, which seems a bit unintuitive). (see
-
qwertyasdfuiopghjkl
wiki.archiveteam.org/index.php/Memory_of_Mankind for an example) The date of the most recent archival would probably be more useful to show than the earliest one, and the infobox has enough room to show *all* the years, so collapsing the list isn't needed. (also, having a list of years without context is a bit ambiguous and this doesn't
-
qwertyasdfuiopghjkl
seem like something that really needs to be done in that/a template anyway?)
-
JAA
No disagreement here. VoynichCR isn't currently here.
-
ssss
@imer i just go with "archiveteams choice". Is it in the magnitude of 1-5 oder 10-20 containers?
-
imer
ssss: if you're running each warrior with 6 concurrency (or whatever it's called in the ui), probably 1-2, might even have to go less than 6 for some projects (none I can think of at the moment, the upcoming orange one had pretty strict limits iirc)
-
ssss
nice to know, thanks
-
imer
just gut feeling of course, might have to tweak as things go :)
-
nstrom|m
generally the limit is not bandwidth but that the site being archived limits to a certain number of connections per IP address before blocking/throttling
-
nstrom|m
and that depends on the individual site/archiving project
-
h2ibot
VoynichCr edited Template:Rescued (-104, error):
wiki.archiveteam.org/?diff=50874&oldid=50867
-
h2ibot
VoynichCr edited Memory of Mankind (+5, 2023):
wiki.archiveteam.org/?diff=50875&oldid=50858
-
VoynichCR
hi
-
SketchCow
I see someone referred to me.
-
SketchCow
I .... am slow to respond here.
-
SketchCow
Just use jscott⊙ao
-
SketchCow
Or jesuschristmorearchiveteamcrap⊙tc
-
SketchCow
Both work equally.
-
Peroniko
Second one is much better sounding
-
fireonlive
🤨
-
anarcat
journalmetro.com is going bankrupt
-
anarcat
i'm going to try to salvage some of it
-
anarcat
oh dear
-
anarcat
-
JAA
Did they wipe content, or are the old sitemaps just broken.
-
JAA
s/\./?/
-
anarcat
i don't know
-
anarcat
that's partly why i was thinking of skipping those
-
JAA
Right. Well, it went through them all by now, and it's good to have a record of this.
-
anarcat
yeah
-
project10
hm, what is "Publi-sac"?
-
JAA
VoynichCR:
-
JAA
14:08:56 < qwertyasdfuiopghjkl> I disagree with the recent edits to
wiki.archiveteam.org/index.php/Template:Rescued making it show only the earliest year of archival and hiding the years of later archival projects behind a "and more" link (that also just scrolls down to the list of categories, which seems a bit unintuitive). (see
-
JAA
14:08:56 < qwertyasdfuiopghjkl>
wiki.archiveteam.org/index.php/Memory_of_Mankind for an example) The date of the most recent archival would probably be more useful to show than the earliest one, and the infobox has enough room to show *all* the years, so collapsing the list isn't needed. (also, having a list of years without context is a bit ambiguous and this doesn't
-
JAA
14:08:57 < qwertyasdfuiopghjkl> seem like something that really needs to be done in that/a template anyway?)
-
project10
oic, publisac = junk mail bomb
-
anarcat
yep
-
project10
"Je suis fier de notre longue histoire (+90 ans dans certains marchés)" -- damn, that's a shame
-
JAA
So the Canucks forums... They have fairly tight rate limits and return AWS WAF captchas (HTTP 405) if you exceed them. There's no AAAA records on the forum.canucks.com. → canucks.ipsdns.com. → vancouver.nhl.invisionmanaged.net. CNAME chain, but it is in fact reachable over IPv6 since Invision's managed hosting supports IPv6.
-
JAA
Looks like a lot of topics in those forums got wiped at some point and return a 'There are no posts to show' error now.
-
JAA
-
nonplussed
anyone know what's going on with archive.today?
-
fireonlive
seems fine here; be sure you're not using 1.1.1.1 and family for your DNS resolver
-
nonplussed
i just use my ISP's default. would that also affect downforeveryoneorjustme.com? that site also says archive.today is down
-
JAA
archive.{today,ph} seems fine.
-
nonplussed
-
JAA
No (and also that site is awful).
-
nonplussed
you have a better one? or one that says it's up?
-
JAA
I tend to do my own checks, so no, don't know a better one.
-
nonplussed
what dns do you use, or recommend?
-
JAA
I run my own recursive resolver.
-
JAA
We use Quad9 for our projects. Specifically 9.9.9.10 and its other IPs.
-
nonplussed
hmm, maybe i'll change to that then
-
nonplussed
seems odd to change though because of one nonworking site
-
JAA
I'm running a qwarc retrieval of the Canucks forum topic pages. I randomly get banned every few minutes with no clear patterns, but it's working well otherwise and should finish in time.
-
JAA
NB, I'm running this *slower* than AB, which isn't getting banned, so no idea...
-
nonplussed
hey JAA, just curious, why did you say downforeveryoneorjustme.com is awful? have never had issues with it myself
-
fireonlive
personally, everything not directly 'can i access $site test' is all user-reported now which has a lot of false positives
-
fireonlive
e.g. can't reach facebook? comcast is down!
-
nonplussed
that site says it does it by "performing a server check from our servers"; i don't see anything about it being user-reported
-
fireonlive
downforeveryoneorjustme.com/crunchyroll for example: "A problem with Crunchyroll has been detected based on visitor reports"
-
fireonlive
but perhaps I was thinking about another site, as this UI looks a bit different
-
nonplussed
oh, weird, i guess it gives user reports when it's a major website. but for most websites i've checked, it's always done a live test
-
JAA
nonplussed: Apart from these user report inaccuracies, I see four different tracking/ads services' scripts, and the site uses JS for absolutely no reason. No thanks.
-
fireonlive
-
eggdrop
-
fireonlive
also paypal has waaaaaaaaaaay too many domains. please stop.
-
JAA
Neat
-
JAA
Direct link for when Twitter dies:
github.com/duckduckgo/tracker-radar 'Data set of top third party web domains with rich metadata about them'
-
fireonlive
ah right.. RIP twitter :(
-
fireonlive
i wonder what 'DC' is in 'IRC2'
-
nicolas17
-
eggdrop
-
fireonlive
microsoft access :(
-
JAA
And they're throwing it into Google Data Studio, which will surely live on forever...
-
nicolas17
someone has to help her get an iso :/
-
JAA
Yeah, that should be the first priority.
-
flashfire42
Its a pity there isnt a way to see how much space is available on targets or how many connections are being attempted at once to see how far down the queue I may be
-
JAA
There is no queue.
-
fireonlive
i don't think the latter is possible
-
fireonlive
it's a lovely thundering herd problem :3
-
fireonlive
or something.
-
Rootliam
I saw in the irc log jason scott said I should contact at him at "jscott⊙ao".... what exactly is ⊙ao
-
fireonlive
oh maybe your client messed it up?
-
fireonlive
jscott⊙ao
-
fireonlive
oh, maybe the logs site did
-
qwertyasdfuiopghjkl
-
fireonlive
the second email is....
-
fireonlive
not worth repeating
-
fireonlive
jason⊙tc exists on that domain, though.
-
fireonlive
ah yeah, or use qwertyasdfuiopghjkl's link