-
imer
-
imer
hope this stops soon -_-
-
JAA
Ack, and same
-
imer
just spot-checking some =0 urls, lots of wordpress blogs must be using bitninjas blacklist :(
-
JAA
BitNinja--
-
eggdrop
[karma] 'BitNinja' now has -1 karma!
-
imer
hopefully retries with a cleaner ip get those
-
fireonlive
bitninja can suck my ass barf
-
arkiver
JAA: i have ways to handle those
-
arkiver
JAA: but which ones were being filtered out now to 'fix' this?
-
arkiver
JAA: let's let the old onions just fail
-
arkiver
actually don't some of them still work?
-
arkiver
JAA: in case of these kind of sites, can we please not add filters but just pause the project until i can look into it?
-
arkiver
i really don't want to add unnecessary filters, or we need to have a good view on which should be removed again
-
JAA
arkiver: All the problematic domains mentioned above. These, I think: jsd686.com ws.ogutsan.com www.pspalls.com nijihypogyhozymy.anvgames.com anvgames.com jyqisajojawopy.anvgames.com tonaku.com
-
JAA
And sure, will do in the future.
-
JAA
Regarding v2 onions, I'd be surprised if you could still establish a circuit to them. The vast, vast majority of relays should run a new enough version of Tor by now that they don't support v2 anymore. I do wonder whether we could run our own relays with an older version of Tor and establish a custom circuit through them...
-
JAA
Oh, forgot dadanja.com in the list above.
-
arkiver
JAA: but that means there's possibly always a tiny part that is not update - so we should support that
-
arkiver
i can't test well with these being filtered out
-
arkiver
JAA: i'll remove the patterns for the websites you listed. will probably let it run for a bit so we get a nice sample, and then i can work on blocking it out
-
arkiver
k-hachiken.com too i think
-
JAA
Tor has nice metrics, but it appears that they're currently broken. Welp.
-
JAA
metrics.torproject.org/versions.html should have a graph of Tor versions in the network.
-
arkiver
i'll be off for 45 minutes or so
-
arkiver
well it's pretty awesome we're getting tor now :)
-
BornOn420
You can add nakedcollegegirlssex.com to the problematic domains with PHP spam
-
arkiver
yeah i noticed that one
-
JAA
-
imer
arkiver/JAA: spam is back at a medium level if you want to hit the pause button 30=200
nakedcollegegirlssex.com/2fJaqH.php?FXcH.xml 36=200
jsd686.com/cA1ij.php?v3DH.xml
-
imer
unless arkiver needs it "blown up"
-
imer
in which case give it an hour tops :p
-
datechnoman
Spam is pretty consistent :(
-
datechnoman
Might spin down for a bit to save some cash
-
datechnoman
Target is struggling atm anyway
-
arkiver
imer: i'm back, a bit of blow up is good
-
imer
welcome back :)
-
arkiver
:)
-
imer
very curious with what solution you'll come up with
-
arkiver
huh
-
arkiver
140k items in backfeed is not what i would call 'blowing up'
-
imer
slow targets been slowing it down :p
-
arkiver
-
imer
ah, so you can run multiple patterns, and if all match it gets thrown out?
-
arkiver
well
-
arkiver
so that table contain a URL as key, and for each key it contains a list of patterns.
-
arkiver
if the URL we get matches the key, it will check if each of the URLs is discovered on the web page. if yes, it will declare the web page spammy and do not queue any new URLs for it
-
imer
oh, I see. that's smart
-
arkiver
yeah it's the next "line of defense" after simple filtering of URLs
-
arkiver
and to prevent having to read and search the entire web page (which is costly for CPU), we use URLs discovered by Wget-AT and match against those
-
imer
good stuff
-
datechnoman
Thats really smart!
-
datechnoman
I like it. Will be very affective
-
datechnoman
When will that roll out?
-
JAA
datechnoman: This has been in use for at least almost a year. Just needs another rule to handle this spam trap, I guess.
-
datechnoman
ohhh gotcha lol... silly me
-
datechnoman
Re-read the line and I mis-read it >.<
-
JAA
A bit over a year, in fact, first introduced on 2023-03-21 and then refactored into the current table structure a couple days later.
-
arkiver
rewby: can we decrease the megaWARC size for urls-tor to 1 GB? so they are uploaded more frequently
-
arkiver
i still need to add to the megaWARC factory that it pumps out a megaWARC at least once a day
-
arkiver
maybe that should be done actually instead of the 1 GB limit
-
arkiver
well so nothing seems to be exploding after i removed those pattern from the filters list
-
arkiver
todo:backfeed stays pretty low
-
imer
arkiver: probably due to target issues? still seeing those two on the top at ~20%ish of requests
-
imer
seems like a waste of resources tbh