-
datechnomanarkiver JAA - Sorry for pinging you both again but these 2 websites have been killing us for the past 2 days and slowing things right down. I was averaging 105k urls per minute before we were hitting them and im down to 40k per min. We are hitting them multiple thousands of times per min which wouldnt be good for their websites either. I've added
-
datechnomansome of the log outputs for the paste minute from my workers. Can provide more if required. transfer.archivete.am/T20Qb/baongoc_vn_http_0.txt
-
datechnoman
-
datechnoman
-
datechnoman
-
eggdropinline (for browser viewing): transfer.archivete.am/inline/T20Qb/baongoc_vn_http_0.txt
-
datechnomanWe have gone backwards on progress by 20million over the last 24-48 hours :(
-
datechnomanMost of my workers are only at 20%-30% consumption as they are querying staging.emediava.org URL's and its taking multiple seconds for a reply
-
» JAA has yoten ^https?://staging%.emediava%.org/.*%%23filter%d+$
-
JAAAnd ^https?://baongoc%.vn:443/news/pdf/nhung%-chinh%-sach%-moi%-co%-hieu%-luc%-tu%-thang%-10%-2023%-3406%.pdfth%%C6%%B0%%E1%%BB%%9FngbXebXebbXebXebanbba
-
JAAThat seems to cover what I'm seeing in your logs and in the queue.
-
JAAdatechnoman: ^
-
JAAFilter rate 42%, oof
-
JAAThat's mostly from the staging.emediava.org one, I think.
-
datechnomanJAA - Thank you sooooooooo much
-
datechnomanIm back up to 90k per min already
-
datechnomanstaging.emediava.org was killing us....
-
datechnomanInstantly back up to 1gbit to the target again <3
-
JAA:-)
-
JAAI have no idea what that thing was supposed to be. It's coming from an SVG CSS `filter: url(%23filter962981974);` with a random number, it seems.
-
JAAThere's a corresponding <filter>, too.
-
JAAWhy they'd use a random number for that beats me.
-
datechnomanI honestly cant make sense of it either. With that being said your much smarter than me :P
-
datechnomanI was struggling to get the HTTP=200 half the time
-
arkiverthanks for taking that one out JAA
-
datechnomanEverything is back to chugging along for the last few hours. Nearly caught up to where we were a few days ago :)
-
datechnomanNeed to be as efficient as we can to smash this backlog down
-
datechnoman
-
h2ibotdatechnoman: Registering N1lsbPbS for '!a transfer.archivete.am/aZtIn/pubmed_doi_identifiers.txt'
-
h2ibotdatechnoman: Deduplicating and queuing 53490 items. (N1lsbPbS)
-
h2ibotdatechnoman: Deduplicated and queued 53490 items. (N1lsbPbS)
-
JAA
-
h2ibotJAA: Registering 7mK34ZVK for '!a transfer.archivete.am/iPJBu/www.ana…f-20240901-213047-bvqa8-offsite.txt'
-
h2ibotJAA: Skipped 54 invalid URLs: transfer.archivete.am/pOQVJ/www.ana…3047-bvqa8-offsite.txt.bad-urls.txt (7mK34ZVK)
-
h2ibotJAA: Skipped 1 very long URLs: transfer.archivete.am/dWkpO/www.ana…13047-bvqa8-offsite.txt.skipped.txt (7mK34ZVK)
-
h2ibotJAA: Deduplicating and queuing 68931 items. (7mK34ZVK)
-
h2ibotJAA: Deduplicated and queued 68931 items. (7mK34ZVK)
-
BartoJAA: :-)
-
JAA:-)
-
katia:-)
-
katiaNice noses, everyone!
-
JAA:-)
-
datechnomanYou know around here we don't share
-
datechnomanWe party hard
-
Barto(-: