03:02:24 Here's all the cdn urls from the last dump I did for discard. Could someone put this in? https://transfer.archivete.am/72nEc/DISCORD-2024-1-16.txt 03:02:24 inline (for browser viewing): https://transfer.archivete.am/inline/72nEc/DISCORD-2024-1-16.txt 03:02:50 Also, can i request voice for here? 03:04:09 !a https://transfer.archivete.am/mN4p2/openstreetmap-planet-website-tag-urls.txt 03:04:10 pabs: Invalid privileges, need one of ('@', '+'). 03:04:14 arkiver: ^ 03:12:15 Vokun: miiiight be better for #archivebot if we 100% want to ensure success 03:14:13 Vokun: threw in #archivebot 03:16:36 seeing some 404s, but mostly 200s 03:16:45 i.e. https://cdn.discordapp.com/attachments/1176761037681328128/1195295488128327680/image.png 03:17:14 !a https://transfer.archivete.am/mN4p2/openstreetmap-planet-website-tag-urls.txt 03:17:15 fireonlive: Registering qYRE07vJ for '!a https://transfer.archivete.am/mN4p2/openstreetmap-planet-website-tag-urls.txt' 03:17:19 pabs: queued 03:17:33 thanks 03:17:45 :) 03:18:12 fireonlive: Skipped 197 invalid URLs: https://transfer.archivete.am/XTo2n/openstreetmap-planet-website-tag-urls.txt.bad-urls.txt (qYRE07vJ) 03:18:13 fireonlive: Fixed 44 unprintable URLs: https://transfer.archivete.am/14hqve/openstreetmap-planet-website-tag-urls.txt.not-printable.txt (qYRE07vJ) 03:18:14 fireonlive: Deduplicating and queuing 2401732 items. (qYRE07vJ) 03:19:57 fireonlive: Deduplicated and queued 2401732 items. (qYRE07vJ) 07:49:55 thanks a lot pabs 07:49:58 also you should get voice here 07:49:59 let me see 07:50:11 how do i make that persistent again 07:50:19 this is my first submission here, so maybe that isn't a good idea yet 07:50:25 i trust you 07:50:38 not sure I do :) 07:50:54 you considered possible DDoS - which is basically the most important thing to do 07:51:31 ok. was still unsure even after throwing it in 07:51:44 next to that is the source of URLs - the URLs still end up permanently in a bloom filter, so the URLs need to be sourced from some interesting source - not just 10 billion random URLs collected 15 years ago of which 90% doesn't exist 07:52:13 or well, not definitely saying no to that, but you get the point 07:52:22 it's something to take into consideration 07:53:14 there we go i think 07:55:53 pabs: this is giving very nice data :) 08:52:18 thanks 08:53:33 TBH I'm not sure OSM website data is great, I expect there are a lot of dead websites, and bogus websites 08:53:50 the next yak to shave is fixing them in the OSM database :) 10:47:06 !a https://transfer.archivete.am/2d3BV/filtered_cdn.discordapp.com.txt 10:47:07 datechnoman: Registering Q6gvdzfF for '!a https://transfer.archivete.am/2d3BV/filtered_cdn.discordapp.com.txt' 10:54:45 datechnoman: Skipped 6 invalid URLs: https://transfer.archivete.am/15ZEgn/filtered_cdn.discordapp.com.txt.bad-urls.txt (Q6gvdzfF) 10:54:47 datechnoman: Fixed 5 unprintable URLs: https://transfer.archivete.am/RdX41/filtered_cdn.discordapp.com.txt.not-printable.txt (Q6gvdzfF) 10:54:48 datechnoman: Deduplicating and queuing 1562268 items. (Q6gvdzfF) 10:58:59 datechnoman: Deduplicated and queued 1562268 items. (Q6gvdzfF)