00:40:23 lol 08:29:54 Queuing bot shutting down. 08:30:06 Queuing bot started. 08:30:07 datechnoman: Restarting unfinished job isAJoDlg for '!a https://transfer.archivete.am/gsq2b/unique_pdfs_output.txt'. 08:30:08 datechnoman: Restarting unfinished job cmKPUONH for '!a https://transfer.archivete.am/HTbgw/filtered_.pdf_output.txt'. 08:30:09 datechnoman: Restarting unfinished job cYpn43AW for '!a https://transfer.archivete.am/nIY9O/filtered_pdf_files_unique.txt'. 08:30:10 datechnoman: Restarting unfinished job YExqnYZo for '!a https://transfer.archivete.am/y01za/filtered_pdf_files_unique.txt'. 08:30:18 datechnoman: Skipped 50 invalid URLs: https://transfer.archivete.am/JAMvl/filtered_pdf_files_unique.txt.bad-urls.txt (YExqnYZo) 08:30:19 datechnoman: Deduplicating and queuing 494563 items. (YExqnYZo) 08:33:16 datechnoman: Skipped 199 invalid URLs: https://transfer.archivete.am/nXRHj/filtered_pdf_files_unique.txt.bad-urls.txt (cYpn43AW) 08:33:22 datechnoman: Skipped 32 very long URLs: https://transfer.archivete.am/i2SGt/filtered_pdf_files_unique.txt.skipped.txt (cYpn43AW) 08:33:23 datechnoman: Deduplicating and queuing 5516002 items. (cYpn43AW) 08:39:49 datechnoman: Skipped 4203 invalid URLs: https://transfer.archivete.am/RfldX/filtered_.pdf_output.txt.bad-urls.txt (cmKPUONH) 08:39:50 datechnoman: Fixed 1 unprintable URLs: https://transfer.archivete.am/ENqon/filtered_.pdf_output.txt.not-printable.txt (cmKPUONH) 08:39:52 datechnoman: Skipped 1 very long URLs: https://transfer.archivete.am/V2og1/filtered_.pdf_output.txt.skipped.txt (cmKPUONH) 08:39:53 datechnoman: Deduplicating and queuing 9401326 items. (cmKPUONH) 08:41:18 datechnoman: Skipped 4203 invalid URLs: https://transfer.archivete.am/10oMpU/unique_pdfs_output.txt.bad-urls.txt (isAJoDlg) 08:41:19 datechnoman: Fixed 1 unprintable URLs: https://transfer.archivete.am/C3jhU/unique_pdfs_output.txt.not-printable.txt (isAJoDlg) 08:41:20 datechnoman: Skipped 1 very long URLs: https://transfer.archivete.am/MxkEz/unique_pdfs_output.txt.skipped.txt (isAJoDlg) 08:41:21 datechnoman: Deduplicating and queuing 9401326 items. (isAJoDlg) 09:08:30 JAA: yeah something like this one is fine to filter, it's likely a loop very specific for this site alone 09:30:44 filtering out and restarted! 11:34:48 arkiver: while we're talking about sources, did you ever put in those pubmed feeds? i know you were thinking about setting up a recrawl system for the delayed-open-access journals, but i see no reason that can't be added later 12:00:50 thuban: no, those are not in at the moment 12:00:56 we should first get through the current backlog 12:01:27 ah, k 12:42:15 Shouldnt take too long and we will be back to up-to-date :) 13:02:29 we've been on like 3-7day eta for the past month it feels like haha 13:06:05 arkiver: this one is still around https://transfer.archivete.am/Jgl23/kep.adatbazisokonline.hu.log although low-ish volume 13:06:06 inline (for browser viewing): https://transfer.archivete.am/inline/Jgl23/kep.adatbazisokonline.hu.log 13:06:48 still like 50/s extrapolated to the whole project 13:19:08 up to 180/s now 13:27:36 I added more capacity from my side to assist with the destruction of that queue 13:28:09 imer: it seems like legit images 13:28:17 i will try to see where they come from 13:28:26 downloading a log from AK to check 13:40:04 under 1 day to clear todo at current rate of speed 13:40:06 keep it up folks 13:45:47 never mind 14:09:43 its just endless redirects though, not seeing a single 200 unless im missing something 14:47:44 ui, my runner passed 1.04TB in data scraped 14:48:08 btw, I asked the hoster; and so far no legal trouble with that machine 14:48:10 so sad 18:18:13 xkey: ill be sure to queue some illegal sites just for you 19:54:38 please don't lol 19:54:47 I'm in Hetzners' good graces, I don't need a reason to not be 20:46:40 https://share.aktheknight.co.uk/riJe0/qAqOqIbi58.png/raw Question for arkiver, am I reading this right the theguardian.com is disabled because it's a dupe, but then we appear to be duplicated with a non root url of the domain? 🤔Or is it duplicated somewhere else? 20:48:50 nyany: ok i won't just for you :3 20:49:58 AK: Deduplication is across all files. It appears uncommented in 900_einpresswire_com.txt. 20:50:13 🤦‍♂️Ahh amazing, thank you 22:06:14 We must've hit those MP4's again 22:06:46 https://usercontent.irccloud-cdn.com/file/aNyt5F8s/image.png 22:09:38 nethogs mate :P 22:11:16 Barto: That's not even all my capacity lol 22:11:35 I have two servers that cannot work on URLs due to the previously discussed quad9 issue 22:12:49 yeah, ran #// at home, isp didnt like it 22:14:09 I could really piss my ISP off (I have a 3Gbps line) but I'd probably run into a threading issue before I saturate my line completely 22:14:21 and yes I've already tried 22:17:21 eh eh eh, i've almost fiber internet, except the damn last 10 meters are a pain in the butt because it's an old place and nobody knows what kind of zigzags was done for the current copper line. 22:17:53 and now way we're not going to punch holes in the neighbor's apartment to figure that out. 22:18:19 god bless last mile 22:18:34 FttN? 22:19:38 My ISP had no issues with the traffic, they just don't like the abuse reports :( 22:20:07 it's already in the street, it's already in the house phone line entrance box, you've got to dispatch to the 4 apartments 22:20:17 My ISP doesn't know what an abuse report is tbh 22:20:22 They're not the brightest bunch of bulbs 22:20:50 AK: exactly, laws here (ianal) kind of ask them to 'act' even if it's bogus. 22:21:24 so yeah, block traffic and put a captcha in the router, and i dont know how far it can go, dont wanna know 22:23:19 fireonlive: lovely :} 22:23:42 Partner and I both wfh, so any interruption to our internet would kinda be a problem unfortunately. Shame really as I was having fun downloading so much 😆 22:24:49 I put that to the test the other day actually 22:25:11 I spun up a few URLs workers on my best available laptop at home with an ethernet port while I was working 22:25:27 Spoiler: It did affect me slightly, but likely because I had too many threads 22:26:32 nyany: 3/3? 22:26:41 yea 22:26:43 :( 22:26:45 hate you 22:26:50 lol 22:26:55 as far as I know it's symmetrical 22:27:25 i got like 22:27:32 200/10? or something 22:28:50 ah the good old days 22:34:25 me waits for 8/8 22:34:26 docsis pls 22:34:39 when can i get docsis 4 pls 22:34:40 sir