-
AKOh that explain the cpu usage being so low, targets can't handle us
-
datechnomanrewby - with imgur slowing right down can we give #// and telegram some target bump?
-
arkiverthat would be good yes if possible
-
datechnomanBeen chugging for the past few days :( but can understand as imgur is/was the priority
-
arkivernyuuzyou: i have only now queue that list of domains you gave a few weeks ago, sorry for the delay
-
arkiverdo you happen to have other lists already in the meantime?
-
arkiverit's queued under todo:secondary
-
arkiverwas only just below 600k items
-
arkiveractually we might want to get these up to a depth of like 2
-
rewbyI'll try to have a poke at it
-
datechnomanThanks rewby! :)
-
rewbydatechnoman: How's it looking now?
-
rewbyoptane9 is chugging at 3gbps constantly
-
rewbyI'm adding buneary to see if it can help
-
arkiverwe're going through the last bits of backlog here, and then incoming data should drop again
-
datechnomanHaving optane and buneary should be plenty. Thanks so much!
-
datechnomanNot seeing the spikes and drops anymore. Can throw my normal worker count back at it
-
arkiverwooh :)
-
datechnomanCant wait to start queuing things again :D
-
datechnomanBeen waiting for the backlog of old urls to clear out before adding more new stuff
-
arkiverwe now archive robots.txt, sitemaps, favicon, etc.
-
arkiver(monthly)
-
arkiverbut there are also things like ads.txt and security.txt. should we start archiving these as well monthly? en.wikipedia.org/wiki/Ads.txt en.wikipedia.org/wiki/Security.txt
-
arkiverif anyone has ideas, please let me know!
-
arkiver
-
Jake
-
JakeThere's also the new `.well-known/ai-plugin.json` for OpenAI plugins
-
arkiveryeah
-
arkiverwe might start archiving these for all domains
-
JAASounds great!
-
datechnoman
-
h2ibotdatechnoman: Skipped 5091 invalid URLs: transfer.archivete.am/VSKH7/sitemap…s_march_april_2023.txt.bad-urls.txt (for 'transfer.archivete.am/R9exx/sitemap_urls_march_april_2023.txt')
-
h2ibotdatechnoman: Deduplicating and queuing 244308 items. (for 'transfer.archivete.am/R9exx/sitemap_urls_march_april_2023.txt')
-
h2ibotdatechnoman: Deduplicated and queued 244308 items. (for 'transfer.archivete.am/R9exx/sitemap_urls_march_april_2023.txt')
-
datechnomanJuicy sitemaps pulled from various crawls :D