03:11:19 arkiver JAA - Sorry for pinging you both again but these 2 websites have been killing us for the past 2 days and slowing things right down. I was averaging 105k urls per minute before we were hitting them and im down to 40k per min. We are hitting them multiple thousands of times per min which wouldnt be good for their websites either. I've added 03:11:19 some of the log outputs for the paste minute from my workers. Can provide more if required. https://transfer.archivete.am/T20Qb/baongoc_vn_http_0.txt 03:11:19 https://transfer.archivete.am/EReco/baongoc_vn_http_200.txt 03:11:19 https://transfer.archivete.am/zClUJ/staging_emediava_org_logs_http_0.txt 03:11:19 https://transfer.archivete.am/FBuj1/staging_emediava_org_logs_http_200.txt 03:11:20 inline (for browser viewing): https://transfer.archivete.am/inline/T20Qb/baongoc_vn_http_0.txt 03:12:05 We have gone backwards on progress by 20million over the last 24-48 hours :( 03:13:31 Most of my workers are only at 20%-30% consumption as they are querying https://staging.emediava.org URL's and its taking multiple seconds for a reply 03:15:00 * JAA has yoten ^https?://staging%.emediava%.org/.*%%23filter%d+$ 03:16:41 And ^https?://baongoc%.vn:443/news/pdf/nhung%-chinh%-sach%-moi%-co%-hieu%-luc%-tu%-thang%-10%-2023%-3406%.pdfth%%C6%%B0%%E1%%BB%%9FngbXebXebbXebXebanbba 03:17:34 That seems to cover what I'm seeing in your logs and in the queue. 03:17:36 datechnoman: ^ 03:18:01 Filter rate 42%, oof 03:18:16 That's mostly from the staging.emediava.org one, I think. 03:41:24 JAA - Thank you sooooooooo much 03:43:07 Im back up to 90k per min already 03:43:21 staging.emediava.org was killing us.... 03:43:53 Instantly back up to 1gbit to the target again <3 03:44:05 :-) 03:45:24 I have no idea what that thing was supposed to be. It's coming from an SVG CSS `filter: url(%23filter962981974);` with a random number, it seems. 03:45:39 There's a corresponding , too. 03:45:51 Why they'd use a random number for that beats me. 03:51:23 I honestly cant make sense of it either. With that being said your much smarter than me :P 03:51:44 I was struggling to get the HTTP=200 half the time 06:51:34 thanks for taking that one out JAA 07:27:07 Everything is back to chugging along for the last few hours. Nearly caught up to where we were a few days ago :) 07:27:18 Need to be as efficient as we can to smash this backlog down 10:02:23 !a https://transfer.archivete.am/aZtIn/pubmed_doi_identifiers.txt 10:02:24 datechnoman: Registering N1lsbPbS for '!a https://transfer.archivete.am/aZtIn/pubmed_doi_identifiers.txt' 10:02:30 datechnoman: Deduplicating and queuing 53490 items. (N1lsbPbS) 10:02:35 datechnoman: Deduplicated and queued 53490 items. (N1lsbPbS) 20:02:20 !a https://transfer.archivete.am/iPJBu/www.anandtech.com-inf-20240901-213047-bvqa8-offsite.txt 20:02:20 JAA: Registering 7mK34ZVK for '!a https://transfer.archivete.am/iPJBu/www.anandtech.com-inf-20240901-213047-bvqa8-offsite.txt' 20:02:30 JAA: Skipped 54 invalid URLs: https://transfer.archivete.am/pOQVJ/www.anandtech.com-inf-20240901-213047-bvqa8-offsite.txt.bad-urls.txt (7mK34ZVK) 20:02:31 JAA: Skipped 1 very long URLs: https://transfer.archivete.am/dWkpO/www.anandtech.com-inf-20240901-213047-bvqa8-offsite.txt.skipped.txt (7mK34ZVK) 20:02:32 JAA: Deduplicating and queuing 68931 items. (7mK34ZVK) 20:02:35 JAA: Deduplicated and queued 68931 items. (7mK34ZVK) 20:03:21 JAA: :-) 20:03:52 :-) 20:05:41 :-) 20:05:50 Nice noses, everyone! 20:05:59 :-) 21:06:01 You know around here we don't share 21:06:08 We party hard 21:12:17 (-: