03:55:41 this person regularly deletes older blog posts, would it be appropriate to use this project to save their new posts? http://blog.sesse.net/ 03:56:01 curren't I'm just doing AB !ao but it would be nice to automate it 08:49:38 arkiver will be able to correct me, but I guess a monthly queue of the homepage with a depth of 1 should mean we grab each new article 10:23:12 Got abuse from theanarchistlibrary.org 10:23:19 We suspect this may be from a botnet, as we have been hit by a number of requests from this the IP above as well as several others. It is a resource exhaustion attack against the webapp running at theanarchistlibrary.org resolvable to 216.252.162.140. 10:25:36 Seeing quite a few status code 0 from a couple of domains: https://share.aktheknight.co.uk/riJe0/NApuloJa91.png/raw 10:28:07 itunese... doesn't exist anymore, but it's still seen quite a lot. digital-forum.it seems to have banned Hetzner, esug.org is just some 302 redirect always 10:28:13 *itunesu 10:28:49 arkiver maybe something to exclude ^ 15:50:03 digital-forum.it seems fine on all my non hetz workers 15:51:31 I feel like we're looping on silky-europe.com due to session ids in url query strings but can't find any actual evidence of that. just seeing a lot of it in my logs 15:51:34 example url https://www.silky-europe.com/arborist/fixed-saw?SID=8179f580143d7c2b60b8c9cd3d8e1448 15:52:48 (also .nl version, e.g. https://www.silky-europe.nl/boomverzorger/vouwzagen?SID=08e9309aa89bf082ea7245d89d8635b5 ) 16:06:21 yeah 17:14:55 getting my brush 17:22:13 putting the trash in our little /dev/null corner 17:33:03 pabs: AK is correct. you can make a PR on urls-sources 17:33:15 or else i'll add it later 17:40:10 so this seems to be a bunch of spam sites coming from China aimed at Thailand 17:43:27 we have indeed been getting quite a bit of anarchistlibary 17:44:08 partially filtering it 17:45:43 nstrom|m: indeed SID was not one we checked for, only sid 17:45:54 fixing the forum URLs too 17:48:36 cleaning done 17:48:43 forcing new version 17:49:16 new minimum version is set 17:49:25 moving todo:backfeed to todo:secondary 18:26:56 danke 18:32:20 Thanks 18:52:44 ooh, we're actually making progress now. many thanks :) 20:00:20 still seem to be having a problem 20:00:40 different one though 20:19:47 but where do those bad URLs come from... 20:28:40 arkiver: which ones specifically? can dig through my logs if you want 20:32:44 these odd .de site (without ending /) 20:32:49 trailing /, i should say 20:41:45 one of them actually loaded once for me but otherwise nearly all seem to be broken 20:42:17 yeah 20:42:20 the problem is where they come from 20:42:24 i don't see them being queued 20:42:47 if you see them being queued somewhere, please give my a bit of log (including that URL, and the parent URL mentioned a little earlier) 20:43:27 roger that 20:45:50 2023-11-08T20:35:12.825451117Z Queuing for parent URL https://bpgqt.petrography.de/app-ads.txt. 20:45:51 2023-11-08T20:35:12.825602179Z Queuing URL https://6pi8.stoutly.de. 20:46:05 nice nice 20:46:09 very nice thanks nstrom|m 20:46:17 2023-11-08T20:37:24.035748601Z Queuing for parent URL https://cbxrt.sociably.de/app-ads.txt. 20:46:17 2023-11-08T20:37:24.036022264Z Queuing URL https://78k7y.selflimited.de. 20:46:17 2023-11-08T20:37:24.036036848Z Queuing for parent URL https://68q3z.theorize.de/ads.txt. 20:46:17 2023-11-08T20:37:24.036042207Z Queuing URL https://9udw2.peripatetic.de. 20:46:24 hope that helps 20:46:27 greatly! 20:46:30 i see the problem now 20:47:26 project paused a bit 20:53:05 interesting way of doing spam 20:53:52 i'll disable this queuing for now 20:55:56 they could do something similar with trust.txt but leaving logic for that in for now 20:57:25 an update is in 20:57:46 restarted 21:01:29 looking pretty good 21:01:31 thanks nstrom|m :) 21:04:46 it's not blowing up :) 21:07:08 or maybe it is 21:09:03 https://a1oel.spidery.de/.well-known/security.txt, https://2n4w8.epithelial.de/.well-known/nodeinfo, https://2h9lu.predestine.de/.well-known/dnt-policy.txt 21:09:06 saw these 21:09:18 maybe it's all or most of the special urls 21:09:18 yeah but not sure if it's coming through those 21:10:35 Queuing for parent URL https://44rw5.discriminate.de/.well-known/ai-plugin.json. 21:10:35 Queuing URL custom:random=202311&url=https%3a%2f%2f4i792%2emeagre%2ede%2f%2ewell%2dknown%2fsecurity%2etxt. 21:11:12 And then quite a list of them. 21:11:28 blegh :/ 21:11:40 AK's logs are helpful again. :-) 21:12:43 actually those are 301s 21:13:13 right 21:13:17 well that's an annoying loop 21:13:30 # curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0)100101 Firefox/119.0" https://5otqf.reprieve.de 21:13:30 FUCK YOU! 21:13:31 :D 21:20:07 updates again 21:20:23 DLoader: that real? 21:20:34 * arkiver check 21:21:09 no :P 21:24:14 I still get that, but not on my Hetzner servers 21:24:26 so dunno what they doing 21:36:46 Queuing for parent URL https://9b953.resemblance.de/.well-known/gpc.json. 21:36:46 Queuing URL custom:random=202311&url=https%3a%2f%2f1mgu%2epainstaking%2ede%2fsitemap%2exml. 21:36:54 still seeing this arkiver 21:59:47 hmm 22:05:09 DLoader: are you sure that second line belonged with that first line? 22:06:14 arkiver: Still seeing similar things in AK's logs. 22:06:40 And I think those are set to 1 concurrency to avoid log mixing etc. 22:07:30 yeah i see them too now 22:07:37 (couldn't check earlier) 22:08:21 just uploaded some more :D https://transfer.archivete.am/inline/94ijC/dedede 22:08:32 i see the problem 22:11:28 another fix pushed 22:11:32 this should really fix it now 22:11:41 will leave the queue as is, should go down fast 22:31:55 alright not looking bad 22:32:08 will move :backfeed to :secondary 22:39:16 looks good! 23:28:11 arkiver: posted to https://github.com/ArchiveTeam/urls-sources/pull/27 23:28:59 arkiver: and I have one other PR open: https://github.com/ArchiveTeam/urls-sources/pull/22