08:06:37 Looks like I got the abuse report for the /.well-known/ urls too at some point 08:07:01 Oops, just had to reply to the "deadline has expired" email asking Hetzner to please not kill my ips haha 08:18:43 weird people/scripts are so uppity about those 08:32:31 can we reply with “hi you’re giving off ‘i just got my first cpanel account and these “access logs” things look scary and confusing’ vibes so ima need you to tone that down juuuust a tad”? 09:18:55 also got abuse mail from hetzner about a hetzner customer getting 8000 requests in 2 hours :D 09:19:11 I think that was first that wasn't about hitting sinkholes 09:19:37 that is 1 request a second approx? 12:58:05 another one "The owner of the domain www.listvote.com has informed us of "Thousands and thousands of senseless requests. I'm sending 403."." 13:00:56 <[42]> got the same with hetzner ^ 13:01:48 I wonder what they mean by 'senseless'. Are they calling their own pages useless? :P 13:03:51 <[42]> it appears to be a bulk abuse report to hetzner for all requests coming from hetzner ips 13:04:01 <[42]> and hetzner having filtered that by customer 13:08:33 Got the same, urgh 13:11:19 Not seeing any listvote in the logs at the mo, and the requests are all from the 16th. So we might be clear now 13:16:40 listvote.com like https://irc.project10.net/uploads/f24f025b193afc16/baby.png 15:16:08 Ah yep, same here 16:51:44 pushed an update for better URL extraction from PDFs 16:51:59 now supporting ' dot ' and ' (dot) ' and ' [dot] ' 16:52:08 some document show some URLs as 16:52:20 http://example dot org/something 16:52:38 does anyone here know of other ways URLs may be made 'unclickable' in documents? 16:53:27 Some people wrote pretty elaborate things for Imgur IIRC. 16:55:57 hmm i had some in my queuing tool as well yeah 16:59:07 Possible bad idea incoming, would there be a way for us to capture strings that had a www. or http:// but didn't end up matching the existing url extraction? 16:59:26 If they could be recorded somewhere we might be able to keep improving the url extraction by seeing what was missed from it 17:00:09 AK: like do you have an example? 17:00:15 https://lounge.kuhaon.fun/folder/14fab20997ac361b/AbuseNormalMailDDoSattackonwww.listvote.com.txt 17:00:22 "The owner of the domain www.listvote.com has informed us of "Thousands and thousands of senseless requests. I'm sending 403."." 17:00:27 heh 17:01:17 oh seems like everyone else got one too :P 17:01:28 Say this was spotted in a file: "http://example dot org/something". The code might go "Ooh this has a http, but we can't match it using the url extraction". It then queues it off to some file somewhere that we can then review later to either: 1. Manually work out the url and feed back in 2. Work out a regex/pattern/extraction method that would have 17:01:28 caught the url and then that can be fed back into the url extraction in the workers 17:01:46 Basically crowd sourcing the url extraction based off of what wasn't caught by it 17:04:57 AK: i'm not sure. 17:05:09 currently we already queue everything as URL that we think might a URL in a PDF 17:05:21 so anything that is not queued was never even considered to possibly be a URL 17:05:41 which would mean that implementing that idea would mean saving all text somewhere from which no URL was extracted, and that could get big 17:05:50 good point 17:08:32 I was sort of thinking for the items that fit in the "We think this might be a url but we can't quite get a 100% right url from it", but if they get queued then that'll sort them out anyway 17:09:14 if we know of something like that we'll get a URL out of it in some way and queue it 17:09:39 we have two categories for pieces of text here: 17:09:44 <[42]> haven't had abuse mails from hetzner in a while until now, but do you have a sort of template response for that? 17:09:51 1. this is probably a URL! let's queue it just in case 17:10:02 2. according to how we see things now, this is definitely not a URL 17:10:45 <[42]> maybe something to include contact info for throttling or excluding their domain (if that's even done)? 17:10:46 sometimes category 2 is wrong, but if we would have more code in place to determine something in category 2 is likely a URL but we're not sure, that would move to category 1 17:13:57 Yeah. Writing statements is always fun :P 17:14:08 i took out listvote.com for now. it's indeed a loop due to their PDFs 17:19:14 <[42]> loops as in queueing the same content multiple times? 17:21:07 no but we pay special attention to queuing all PDFs we come across, and all URLs found in each PDF are queued, etc., so that can create loops 17:21:10 it's rare though 17:27:12 "A statement has been successfully entered for this issue. It will be checked by a staff member and afterwards the relevant ticket will be closed." 17:27:16 https://lounge.kuhaon.fun/folder/a6b8d8bfb74f979d/fanning-self-over-it.gif 17:29:28 <[42]> https://gist.github.com/Nothing4You/51d1d89ca37ecab444c67cc8bd32dfa5 does this seem reasonable? if so, it could also serve as template for others 17:31:16 [42]: i would leave out the "while this issue is not addressed" 17:31:35 excluded is kind of 'issue fixed' right 17:31:36 ? 17:31:42 you could even leave out the PDF explanation 17:31:51 just mention we won't be making more requests to listvote.com 17:32:09 <[42]> ok 17:33:35 <[42]> updated the gist 17:35:00 looks good :) 17:35:04 [42]: ^ 17:35:16 make it a rare issue 17:35:46 yeah could add that 17:35:47 <[42]> updated 17:35:56 yeah makes it sound better :) 17:36:01 yes 17:37:44 <[42]> DLoader, Craigle: if you still need to reply to hetzner, see above 17:47:25 [42] Thanks. I sent one earlier. My usual boilerplate about Archiveteam, no abuse intended, etc. Noted that these weren't "useless requests" but also that the site was removed and would not be accessed again 17:47:49 It generally covers all the bases with them. If not, I'll deal with whatever they come back with 17:52:16 "Oopsie whoopsie my server did an oopsie UwU I wiww make suwe i-it does nyot h-happen again" That would either make them never answer or you would get insta banned :P 17:52:31 love it 18:08:02 https://youtu.be/fFtc8zeI6F8 18:08:13 send this :3 18:12:13 https://lounge.kuhaon.fun/folder/a8a501f13cb033c8/8355uw.jpg