01:29:51 fireonlive: add to Deathwatch? 02:35:38 PaulWise edited Site exploration (+104, add bing-scrape link (by JAA)): https://wiki.archiveteam.org/?diff=50868&oldid=50594 04:15:53 Yts98 edited ZOWA (+50, Mark as in progress): https://wiki.archiveteam.org/?diff=50869&oldid=50844 04:36:56 Yts98 edited Current Projects (+0, move ZOWA to current): https://wiki.archiveteam.org/?diff=50870&oldid=50777 05:16:02 PaulWise created ArchiveBot/Ignore/NonSequentialIntegers (+5986, add a way to ignore non-sequential integer…): https://wiki.archiveteam.org/?title=ArchiveBot/Ignore/NonSequentialIntegers 05:21:03 PaulWise edited MoinMoin (+157, link to the non-sequential integer ignores page): https://wiki.archiveteam.org/?diff=50872&oldid=50783 13:58:37 JustAnotherArchivist edited Miraheze (+29, Datetimeify): https://wiki.archiveteam.org/?diff=50873&oldid=50859 14:06:22 How many warriors/containers should i run if i have a connection of 50mbits/10mbits? 14:07:30 ssss: usually you're limited by the sites we're archiving, so it depends on that and what project you're running, some projects are more bandwidth hungry than others 14:08:56 I disagree with the recent edits to https://wiki.archiveteam.org/index.php/Template:Rescued making it show only the earliest year of archival and hiding the years of later archival projects behind a "and more" link (that also just scrolls down to the list of categories, which seems a bit unintuitive). (see 14:08:56 https://wiki.archiveteam.org/index.php/Memory_of_Mankind for an example) The date of the most recent archival would probably be more useful to show than the earliest one, and the infobox has enough room to show *all* the years, so collapsing the list isn't needed. (also, having a list of years without context is a bit ambiguous and this doesn't 14:08:57 seem like something that really needs to be done in that/a template anyway?) 14:09:48 No disagreement here. VoynichCR isn't currently here. 14:10:10 @imer i just go with "archiveteams choice". Is it in the magnitude of 1-5 oder 10-20 containers? 14:12:04 ssss: if you're running each warrior with 6 concurrency (or whatever it's called in the ui), probably 1-2, might even have to go less than 6 for some projects (none I can think of at the moment, the upcoming orange one had pretty strict limits iirc) 14:12:48 nice to know, thanks 14:12:51 just gut feeling of course, might have to tweak as things go :) 14:15:41 generally the limit is not bandwidth but that the site being archived limits to a certain number of connections per IP address before blocking/throttling 14:16:17 and that depends on the individual site/archiving project 14:31:43 VoynichCr edited Template:Rescued (-104, error): https://wiki.archiveteam.org/?diff=50874&oldid=50867 14:36:44 VoynichCr edited Memory of Mankind (+5, 2023): https://wiki.archiveteam.org/?diff=50875&oldid=50858 14:37:43 hi 14:39:36 I see someone referred to me. 14:39:42 I .... am slow to respond here. 14:39:46 Just use jscott⊙ao 14:39:55 Or jesuschristmorearchiveteamcrap⊙tc 14:39:58 Both work equally. 14:42:04 Second one is much better sounding 15:34:17 🤨 15:40:35 https://journalmetro.com/ is going bankrupt 15:40:50 i'm going to try to salvage some of it 15:45:46 oh dear 15:45:54 https://journalmetro.com/sitemap.xml?yyyy=2015&mm=09&dd=26 looks pretty bad 15:48:45 Did they wipe content, or are the old sitemaps just broken. 15:48:51 s/\./?/ 15:56:05 i don't know 15:56:19 that's partly why i was thinking of skipping those 15:57:04 Right. Well, it went through them all by now, and it's good to have a record of this. 15:58:14 yeah 16:04:03 hm, what is "Publi-sac"? 16:04:24 VoynichCR: 16:04:24 14:08:56 < qwertyasdfuiopghjkl> I disagree with the recent edits to https://wiki.archiveteam.org/index.php/Template:Rescued making it show only the earliest year of archival and hiding the years of later archival projects behind a "and more" link (that also just scrolls down to the list of categories, which seems a bit unintuitive). (see 16:04:29 14:08:56 < qwertyasdfuiopghjkl> https://wiki.archiveteam.org/index.php/Memory_of_Mankind for an example) The date of the most recent archival would probably be more useful to show than the earliest one, and the infobox has enough room to show *all* the years, so collapsing the list isn't needed. (also, having a list of years without context is a bit ambiguous and this doesn't 16:04:35 14:08:57 < qwertyasdfuiopghjkl> seem like something that really needs to be done in that/a template anyway?) 16:06:36 oic, publisac = junk mail bomb 16:07:04 yep 16:08:55 "Je suis fier de notre longue histoire (+90 ans dans certains marchés)" -- damn, that's a shame 16:13:31 So the Canucks forums... They have fairly tight rate limits and return AWS WAF captchas (HTTP 405) if you exceed them. There's no AAAA records on the forum.canucks.com. → canucks.ipsdns.com. → vancouver.nhl.invisionmanaged.net. CNAME chain, but it is in fact reachable over IPv6 since Invision's managed hosting supports IPv6. 17:23:18 Looks like a lot of topics in those forums got wiped at some point and return a 'There are no posts to show' error now. 17:23:41 See e.g. https://forum.canucks.com/forum/2-general-hockey-discussion/page/1018/ 18:02:46 anyone know what's going on with archive.today? 18:04:09 seems fine here; be sure you're not using and family for your DNS resolver 18:05:27 i just use my ISP's default. would that also affect downforeveryoneorjustme.com? that site also says archive.today is down 18:05:47 archive.{today,ph} seems fine. 18:07:55 okay and does this say it's up for you? https://downforeveryoneorjustme.com/archive.today 18:10:06 No (and also that site is awful). 18:11:24 you have a better one? or one that says it's up? 18:13:08 I tend to do my own checks, so no, don't know a better one. 18:13:30 what dns do you use, or recommend? 18:13:38 I run my own recursive resolver. 18:14:07 We use Quad9 for our projects. Specifically and its other IPs. 18:16:08 hmm, maybe i'll change to that then 18:16:57 seems odd to change though because of one nonworking site 19:48:38 I'm running a qwarc retrieval of the Canucks forum topic pages. I randomly get banned every few minutes with no clear patterns, but it's working well otherwise and should finish in time. 19:50:24 NB, I'm running this *slower* than AB, which isn't getting banned, so no idea... 21:38:19 hey JAA, just curious, why did you say downforeveryoneorjustme.com is awful? have never had issues with it myself 21:40:00 personally, everything not directly 'can i access $site test' is all user-reported now which has a lot of false positives 21:40:07 e.g. can't reach facebook? comcast is down! 21:41:46 that site says it does it by "performing a server check from our servers"; i don't see anything about it being user-reported 21:42:40 https://downforeveryoneorjustme.com/crunchyroll for example: "A problem with Crunchyroll has been detected based on visitor reports" 21:42:50 but perhaps I was thinking about another site, as this UI looks a bit different 21:49:19 oh, weird, i guess it gives user reports when it's a major website. but for most websites i've checked, it's always done a live test 22:31:21 nonplussed: Apart from these user report inaccuracies, I see four different tracking/ads services' scripts, and the site uses JS for absolutely no reason. No thanks. 22:39:12 could be useful for us: https://twitter.com/iustinBB/status/1703785504670445780 22:39:12 nitter: https://nitter.net/iustinBB/status/1703785504670445780 22:39:23 also paypal has waaaaaaaaaaay too many domains. please stop. 22:45:04 Neat 22:46:21 Direct link for when Twitter dies: https://github.com/duckduckgo/tracker-radar 'Data set of top third party web domains with rich metadata about them' 22:47:03 ah right.. RIP twitter :( 22:52:39 i wonder what 'DC' is in 'IRC2' 23:05:56 https://twitter.com/mtruslowstorey/status/1703781076689121556 23:05:57 nitter: https://nitter.net/mtruslowstorey/status/1703781076689121556 23:08:06 microsoft access :( 23:09:29 And they're throwing it into Google Data Studio, which will surely live on forever... 23:09:52 someone has to help her get an iso :/ 23:10:08 Yeah, that should be the first priority. 23:12:20 Its a pity there isnt a way to see how much space is available on targets or how many connections are being attempted at once to see how far down the queue I may be 23:12:38 There is no queue. 23:12:40 i don't think the latter is possible 23:14:37 it's a lovely thundering herd problem :3 23:14:48 or something. 23:52:12 I saw in the irc log jason scott said I should contact at him at "jscott⊙ao".... what exactly is ⊙ao 23:54:06 oh maybe your client messed it up? 23:54:37 jscott⊙ao 23:54:46 oh, maybe the logs site did 23:54:58 https://hackint.logs.kiska.pw/archiveteam-bs/20230918#c381184 doesn't show the whole email address, try https://irclogs.archivete.am/archiveteam-bs/2023-09-18#l689f4802 23:55:23 the second email is.... 23:55:26 not worth repeating 23:56:18 jason⊙tc exists on that domain, though. 23:57:07 ah yeah, or use qwertyasdfuiopghjkl's link