-
VokunHere's all the cdn urls from the last dump I did for discard. Could someone put this in? transfer.archivete.am/72nEc/DISCORD-2024-1-16.txt
-
eggdropinline (for browser viewing): transfer.archivete.am/inline/72nEc/DISCORD-2024-1-16.txt
-
VokunAlso, can i request voice for here?
-
pabs
-
h2ibotpabs: Invalid privileges, need one of ('@', '+').
-
pabsarkiver: ^
-
fireonliveVokun: miiiight be better for #archivebot if we 100% want to ensure success
-
fireonliveVokun: threw in #archivebot
-
fireonliveseeing some 404s, but mostly 200s
-
fireonlive
-
fireonlive
-
h2ibotfireonlive: Registering qYRE07vJ for '!a transfer.archivete.am/mN4p2/openstreetmap-planet-website-tag-urls.txt'
-
fireonlivepabs: queued
-
pabsthanks
-
fireonlive:)
-
h2ibotfireonlive: Skipped 197 invalid URLs: transfer.archivete.am/XTo2n/openstr…t-website-tag-urls.txt.bad-urls.txt (qYRE07vJ)
-
h2ibotfireonlive: Fixed 44 unprintable URLs: transfer.archivete.am/14hqve/openst…site-tag-urls.txt.not-printable.txt (qYRE07vJ)
-
h2ibotfireonlive: Deduplicating and queuing 2401732 items. (qYRE07vJ)
-
h2ibotfireonlive: Deduplicated and queued 2401732 items. (qYRE07vJ)
-
arkiverthanks a lot pabs
-
arkiveralso you should get voice here
-
arkiverlet me see
-
arkiverhow do i make that persistent again
-
pabsthis is my first submission here, so maybe that isn't a good idea yet
-
arkiveri trust you
-
pabsnot sure I do :)
-
arkiveryou considered possible DDoS - which is basically the most important thing to do
-
pabsok. was still unsure even after throwing it in
-
arkivernext to that is the source of URLs - the URLs still end up permanently in a bloom filter, so the URLs need to be sourced from some interesting source - not just 10 billion random URLs collected 15 years ago of which 90% doesn't exist
-
arkiveror well, not definitely saying no to that, but you get the point
-
arkiverit's something to take into consideration
-
arkiverthere we go i think
-
arkiverpabs: this is giving very nice data :)
-
pabsthanks
-
pabsTBH I'm not sure OSM website data is great, I expect there are a lot of dead websites, and bogus websites
-
pabsthe next yak to shave is fixing them in the OSM database :)
-
datechnoman
-
h2ibotdatechnoman: Registering Q6gvdzfF for '!a transfer.archivete.am/2d3BV/filtered_cdn.discordapp.com.txt'
-
h2ibotdatechnoman: Skipped 6 invalid URLs: transfer.archivete.am/15ZEgn/filter…cdn.discordapp.com.txt.bad-urls.txt (Q6gvdzfF)
-
h2ibotdatechnoman: Fixed 5 unprintable URLs: transfer.archivete.am/RdX41/filtere…iscordapp.com.txt.not-printable.txt (Q6gvdzfF)
-
h2ibotdatechnoman: Deduplicating and queuing 1562268 items. (Q6gvdzfF)
-
h2ibotdatechnoman: Deduplicated and queued 1562268 items. (Q6gvdzfF)