#//

08:06

AK

Looks like I got the abuse report for the /.well-known/ urls too at some point
08:07

AK

Oops, just had to reply to the "deadline has expired" email asking Hetzner to please not kill my ips haha
08:18

fireonlive

weird people/scripts are so uppity about those
08:32

fireonlive

can we reply with “hi you’re giving off ‘i just got my first cpanel account and these “access logs” things look scary and confusing’ vibes so ima need you to tone that down juuuust a tad”?
09:18

DLoader

also got abuse mail from hetzner about a hetzner customer getting 8000 requests in 2 hours :D
09:19

DLoader

I think that was first that wasn't about hitting sinkholes
09:19

katia

that is 1 request a second approx?
12:58

DLoader

another one "The owner of the domain www.listvote.com has informed us of "Thousands and thousands of senseless requests. I'm sending 403."."
13:00

[42]

got the same with hetzner ^
13:01

TheTechRobo

I wonder what they mean by 'senseless'. Are they calling their own pages useless? :P
13:03

[42]

it appears to be a bulk abuse report to hetzner for all requests coming from hetzner ips
13:04

[42]

and hetzner having filtered that by customer
13:08

AK

Got the same, urgh
13:11

AK

Not seeing any listvote in the logs at the mo, and the requests are all from the 16th. So we might be clear now
13:16

project10

listvote.com like irc.project10.net/uploads/f24f025b193afc16/baby.png
15:16

Craigle

Ah yep, same here
16:51

arkiver

pushed an update for better URL extraction from PDFs
16:51

arkiver

now supporting ' dot ' and ' (dot) ' and ' [dot] '
16:52

arkiver

some document show some URLs as
16:52

arkiver

example dot org/something
16:52

arkiver

does anyone here know of other ways URLs may be made 'unclickable' in documents?
16:53

JAA

Some people wrote pretty elaborate things for Imgur IIRC.
16:55

arkiver

hmm i had some in my queuing tool as well yeah
16:59

AK

Possible bad idea incoming, would there be a way for us to capture strings that had a www. or http:// but didn't end up matching the existing url extraction?
16:59

AK

If they could be recorded somewhere we might be able to keep improving the url extraction by seeing what was missed from it
17:00

arkiver

AK: like do you have an example?
17:00

that_lurker

lounge.kuhaon.fun/folder/14fab20997…ailDDoSattackonwww.listvote.com.txt
17:00

that_lurker

"The owner of the domain www.listvote.com has informed us of "Thousands and thousands of senseless requests. I'm sending 403."."
17:00

that_lurker

heh
17:01

that_lurker

oh seems like everyone else got one too :P
17:01

AK

Say this was spotted in a file: "example dot org/something". The code might go "Ooh this has a http, but we can't match it using the url extraction". It then queues it off to some file somewhere that we can then review later to either: 1. Manually work out the url and feed back in 2. Work out a regex/pattern/extraction method that would have
17:01

AK

caught the url and then that can be fed back into the url extraction in the workers
17:01

AK

Basically crowd sourcing the url extraction based off of what wasn't caught by it
17:04

arkiver

AK: i'm not sure.
17:05

arkiver

currently we already queue everything as URL that we think might a URL in a PDF
17:05

arkiver

so anything that is not queued was never even considered to possibly be a URL
17:05

arkiver

which would mean that implementing that idea would mean saving all text somewhere from which no URL was extracted, and that could get big
17:05

AK

good point
17:08

AK

I was sort of thinking for the items that fit in the "We think this might be a url but we can't quite get a 100% right url from it", but if they get queued then that'll sort them out anyway
17:09

arkiver

if we know of something like that we'll get a URL out of it in some way and queue it
17:09

arkiver

we have two categories for pieces of text here:
17:09

[42]

haven't had abuse mails from hetzner in a while until now, but do you have a sort of template response for that?
17:09

arkiver

1. this is probably a URL! let's queue it just in case
17:10

arkiver

2. according to how we see things now, this is definitely not a URL
17:10

[42]

maybe something to include contact info for throttling or excluding their domain (if that's even done)?
17:10

arkiver

sometimes category 2 is wrong, but if we would have more code in place to determine something in category 2 is likely a URL but we're not sure, that would move to category 1
17:13

that_lurker

Yeah. Writing statements is always fun :P
17:14

arkiver

i took out listvote.com for now. it's indeed a loop due to their PDFs
17:19

[42]

loops as in queueing the same content multiple times?
17:21

arkiver

no but we pay special attention to queuing all PDFs we come across, and all URLs found in each PDF are queued, etc., so that can create loops
17:21

arkiver

it's rare though
17:27

that_lurker

"A statement has been successfully entered for this issue. It will be checked by a staff member and afterwards the relevant ticket will be closed."
17:27

that_lurker

lounge.kuhaon.fun/folder/a6b8d8bfb74f979d/fanning-self-over-it.gif
17:29

[42]

gist.github.com/Nothing4You/51d1d89ca37ecab444c67cc8bd32dfa5 does this seem reasonable? if so, it could also serve as template for others
17:31

arkiver

[42]: i would leave out the "while this issue is not addressed"
17:31

arkiver

excluded is kind of 'issue fixed' right
17:31

arkiver

?
17:31

arkiver

you could even leave out the PDF explanation
17:31

arkiver

just mention we won't be making more requests to listvote.com
17:32

[42]

ok
17:33

[42]

updated the gist
17:35

arkiver

looks good :)
17:35

arkiver

[42]: ^
17:35

that_lurker

make it a rare issue
17:35

arkiver

yeah could add that
17:35

[42]

updated
17:35

that_lurker

yeah makes it sound better :)
17:36

arkiver

yes
17:37

[42]

DLoader, Craigle: if you still need to reply to hetzner, see above
17:47

Craigle

[42] Thanks. I sent one earlier. My usual boilerplate about Archiveteam, no abuse intended, etc. Noted that these weren't "useless requests" but also that the site was removed and would not be accessed again
17:47

Craigle

It generally covers all the bases with them. If not, I'll deal with whatever they come back with
17:52

that_lurker

"Oopsie whoopsie my server did an oopsie UwU I wiww make suwe i-it does nyot h-happen again" That would either make them never answer or you would get insta banned :P
17:52

fireonlive

love it
18:08

fireonlive

youtu.be/fFtc8zeI6F8
18:08

fireonlive

send this :3
18:12

that_lurker

lounge.kuhaon.fun/folder/a8a501f13cb033c8/8355uw.jpg

11 months ago

« a day earlier

a day later »

today »