-
datechnoman
-
h2ibotdatechnoman: Registering mK31PjiL for '!a transfer.archivete.am/EuSbm/urls_batch_0005_000.txt'
-
h2ibotdatechnoman: Skipped 3174 invalid URLs: transfer.archivete.am/grlnj/urls_batch_0005_000.txt.bad-urls.txt (mK31PjiL)
-
h2ibotdatechnoman: Skipped 60 very long URLs: transfer.archivete.am/YxX11/urls_batch_0005_000.txt.skipped.txt (mK31PjiL)
-
h2ibotdatechnoman: Deduplicating and queuing 24996765 items. (mK31PjiL)
-
h2ibotdatechnoman: Deduplicated and queued 24996767 items. (mK31PjiL)
-
datechnoman
-
h2ibotdatechnoman: Registering kBQDlKvp for '!a transfer.archivete.am/cGSwj/urls_batch_0005_001.txt'
-
kiskadatechnoman: Name your batches "makenyanyunhappy" :D
-
datechnomanimer_ does that for the pdf's. I'm running url batches and nyany_ isn't running any workers lol
-
h2ibotdatechnoman: Skipped 3559 invalid URLs: transfer.archivete.am/10lQ4f/urls_batch_0005_001.txt.bad-urls.txt (kBQDlKvp)
-
h2ibotdatechnoman: Skipped 78 very long URLs: transfer.archivete.am/13dFTa/urls_batch_0005_001.txt.skipped.txt (kBQDlKvp)
-
h2ibotdatechnoman: Deduplicating and queuing 24996363 items. (kBQDlKvp)
-
datechnoman
-
h2ibotdatechnoman: Registering OXFwyI8w for '!a transfer.archivete.am/fHg6P/urls_batch_0005_002.txt'
-
h2ibotdatechnoman: Skipped 587 invalid URLs: transfer.archivete.am/3pvJl/urls_batch_0005_002.txt.bad-urls.txt (OXFwyI8w)
-
h2ibotdatechnoman: Skipped 9 very long URLs: transfer.archivete.am/G1SLg/urls_batch_0005_002.txt.skipped.txt (OXFwyI8w)
-
h2ibotdatechnoman: Deduplicating and queuing 4667052 items. (OXFwyI8w)
-
h2ibotdatechnoman: Deduplicated and queued 4667054 items. (OXFwyI8w)
-
h2ibotdatechnoman: Deduplicated and queued 24996368 items. (kBQDlKvp)
-
kiskaSomething occurred to make the graph less spiky server8.kiska.pw/uploads/d4ea36fa7ace3dd5/image.png
-
nstrom|myeah my workers that were unhappy got happy. I think it was something networking related tbh
-
fireonlivesudo rebootall
-
datechnomanrewby|backup_ cleared up temporary upload files that were filling up space on optane10, We then were very spikey due to larger PDF files which I have "watered down" with some edu domain urls
-
datechnomanThis has gotten us back to a nice balance between speed and size of items and back to normal
-
datechnomanNow its just a matter of pushing through all of the sitemaps for the next week or so, then we can get back down to business smashing out educational sites and PDF doc's
-
datechnomanAlso will have to start with gov domains later on once all the edu domains batches are done (still got a stack to go through)