-
nstrom|m
-
eggdrop
-
nstrom|m
probably need arkiver for a proper fix, not sure if it could be blocked w/ regex filter w/o risk of collateral damage
-
datechnoman
Good pickup. I was wondering why the queue suddenly started blowing out
-
datechnoman
Maybe JAA has an idea if a temp fix can be put in place in the meantime
-
JAA
Not sure, it's quite messy.
-
JAA
There's at least three domain labels, and the rest of the URL is /books/[a-z]+/[0-9]+\.html$, but that doesn't seem particularly restrictive.
-
datechnoman
All good. Wasnt sure if there was some kind of magic
-
datechnoman
-
h2ibot
-
h2ibot
datechnoman: Deduplicating and queuing 103977 items. (fq9zfPaK)
-
h2ibot
datechnoman: Deduplicated and queued 103977 items. (fq9zfPaK)
-
monoxane
putting 50x20 on urls to help unfuck the mess ive made :D
-
datechnoman
monoxane <3
-
datechnoman
Thanks! :D
-
datechnoman
JAA arkiver - Could I please hassle either of you to filter out (somehow) the annoying loop for baongoc.vn?
transfer.archivete.am/B37cB/baongoc.vn_loop_urls.txt
-
eggdrop
-
datechnoman
This has been going on for weeks and isnt huge but its constant and im seeing 2000+ hits across workers every 5 mins so worth cleaning up if simple enough
-
datechnoman
No rush but greatly appreciate any tweak :)
-
nstrom|m
yeah those are weird , I don't understand how they keep being 200s even w all that junk added after the filename
-
nstrom|m
looks like it's actually dynamically generating pdf content each time based on the url
-
datechnoman
Exactly, and that is why we keep archiving the same PDF as the url is dynamically being generated
-
datechnoman
We have been smothering their website for weeks and im surprised they havent started blocking our IP's lol
-
arkiver
datechnoman: yes i will check it in an hour
-
datechnoman
Thanks mate! No rush :)
-
datechnoman
I've cranked right up, so ive been combing over the logs to looks at loops and stuff
-
datechnoman
I about to head off to bed so ill catch round!
-
datechnoman
100k urls per min my Grafana reckons. Not bad at all :D
-
monoxane
not bad good size
-
monoxane
I cannot beat that :P
-
datechnoman
Your doing just fine!
-
datechnoman
I have 3000 concurrent connects going lol
-
datechnoman
Connections/concurrency
-
datechnoman
Can finally destroy the backlog now that mildom is slowly down and not hogging IA ingest
-
knecht
Very nice to see todo going down slowly
-
arkiver
nstrom|m: hmm yeah that is an annoying one
-
arkiver
JAA: you did nothing around the /books/ loop right?
-
arkiver
95% of the time the annoying ones are some chinese site
-
knecht
maybe you have some time to peek into the PRs while you're in the code 👀
-
arkiver
knecht: yeah that too
-
arkiver
knecht: i do see cloudflare on xrel.to, which is a problem
-
knecht
a problem with the frequency? i suppose it could be turned down a good notch
-
arkiver
knecht: merged and left a comment
-
arkiver
no problem with frequency
-
knecht
great! thank you very much
-
arkiver
:)
-
arkiver
those /books/ URLs just give me a 404 now :/
-
arkiver
JAA: nstrom|m ^ FYI
-
nstrom|m
most seem to be 404, a few are still working
-
nstrom|m
-
nstrom|m
just saw that one fly by
-
arkiver
yay also not 404 here
-
arkiver
thanks!
-
arkiver
ah i see /template/default/moban in there
-
arkiver
-
nstrom|m
yep definitely looked like a familiar MO
-
nstrom|m
-
nstrom|m
-
JAA
arkiver: Yeah, I didn't touch them.
-
arkiver
very helpful thanks nstrom|m
-
arkiver
nstrom|m: do you perhaps have one more for me?
-
arkiver
testing a solution
-
nstrom|m
checking
-
nstrom|m
-
nstrom|m
-
arkiver
datechnoman: the baongoc.vn loop is gone
-
arkiver
thanks nstrom|m
-
nstrom|m
thank you!
-
arkiver
AK: is it correct you have nothing running anymore for URLs on hel1?
-
arkiver
nstrom|m: do you perhaps have a longer log as well?
-
arkiver
they are ignored now, but i didn't find out yet where the URLs come from
-
nstrom|m
sent via pm
-
nstrom|m
I have pretty short log retention though so might not be that useful
-
arkiver
yeah let's see
-
arkiver
couldn't fine the source well
-
arkiver
haha they blocked my IP
-
arkiver
a fix is in
-
arkiver
i'm not sure this fixes the source...
-
arkiver
we'll see
-
arkiver
i'll move URLs to secondary tomorrow
-
arkiver
i'm off now
-
arkiver
loops will stand out when moving items to secondary, as they are queued and enlarged in :todo:backfeed, so i want to be around when that is being done
-
datechnoman
monoxane looks like code update didn't roll out to your workers FYI