04:23:10 !a https://transfer.archivete.am/EuSbm/urls_batch_0005_000.txt 04:23:10 datechnoman: Registering mK31PjiL for '!a https://transfer.archivete.am/EuSbm/urls_batch_0005_000.txt' 05:05:46 datechnoman: Skipped 3174 invalid URLs: https://transfer.archivete.am/grlnj/urls_batch_0005_000.txt.bad-urls.txt (mK31PjiL) 05:05:47 datechnoman: Skipped 60 very long URLs: https://transfer.archivete.am/YxX11/urls_batch_0005_000.txt.skipped.txt (mK31PjiL) 05:05:48 datechnoman: Deduplicating and queuing 24996765 items. (mK31PjiL) 05:33:23 datechnoman: Deduplicated and queued 24996767 items. (mK31PjiL) 08:26:28 !a https://transfer.archivete.am/cGSwj/urls_batch_0005_001.txt 08:26:30 datechnoman: Registering kBQDlKvp for '!a https://transfer.archivete.am/cGSwj/urls_batch_0005_001.txt' 08:28:27 datechnoman: Name your batches "makenyanyunhappy" :D 08:57:00 imer_ does that for the pdf's. I'm running url batches and nyany_ isn't running any workers lol 09:25:39 datechnoman: Skipped 3559 invalid URLs: https://transfer.archivete.am/10lQ4f/urls_batch_0005_001.txt.bad-urls.txt (kBQDlKvp) 09:25:42 datechnoman: Skipped 78 very long URLs: https://transfer.archivete.am/13dFTa/urls_batch_0005_001.txt.skipped.txt (kBQDlKvp) 09:25:44 datechnoman: Deduplicating and queuing 24996363 items. (kBQDlKvp) 09:27:44 !a https://transfer.archivete.am/fHg6P/urls_batch_0005_002.txt 09:27:49 datechnoman: Registering OXFwyI8w for '!a https://transfer.archivete.am/fHg6P/urls_batch_0005_002.txt' 09:40:21 datechnoman: Skipped 587 invalid URLs: https://transfer.archivete.am/3pvJl/urls_batch_0005_002.txt.bad-urls.txt (OXFwyI8w) 09:40:22 datechnoman: Skipped 9 very long URLs: https://transfer.archivete.am/G1SLg/urls_batch_0005_002.txt.skipped.txt (OXFwyI8w) 09:40:24 datechnoman: Deduplicating and queuing 4667052 items. (OXFwyI8w) 09:45:57 datechnoman: Deduplicated and queued 4667054 items. (OXFwyI8w) 10:01:15 datechnoman: Deduplicated and queued 24996368 items. (kBQDlKvp) 16:35:07 Something occurred to make the graph less spiky https://server8.kiska.pw/uploads/d4ea36fa7ace3dd5/image.png 20:54:39 yeah my workers that were unhappy got happy. I think it was something networking related tbh 21:05:30 sudo rebootall 22:20:29 rewby|backup_ cleared up temporary upload files that were filling up space on optane10, We then were very spikey due to larger PDF files which I have "watered down" with some edu domain urls 22:20:43 This has gotten us back to a nice balance between speed and size of items and back to normal 22:32:56 Now its just a matter of pushing through all of the sitemaps for the next week or so, then we can get back down to business smashing out educational sites and PDF doc's 22:33:19 Also will have to start with gov domains later on once all the edu domains batches are done (still got a stack to go through)