-
fuzzy8021
datechnoman throw a few concurrency on telegram with those boxes too ;)
-
datechnoman
I actually should. Totally forgot to throw the container in the config... have urlteams in there. Used to have it in the config but updated and forgot to re-add. Thanks for the hint! fuzzy8021
-
nyany
just for you fuzzy
-
nyany
-
nyany
@ fuzzy8021
-
fuzzy8021
thanks :)
-
datechnoman
Will get back on track once robloxd spins down
-
fuzzy8021
indeed but its been a ton of effort to get under 210million to do items...hurts to see it back at 217
-
datechnoman
Yeah you've been doing great work. Also doesn't help we are running full speed here and discovering new channels and posts at a higher rate
-
thuban
arkiver: here is a list of ~3k news sites from around the world, collected from bbc media guides:
transfer.archivete.am/HS1cJ/bbc_mediaguides.tsv
-
eggdrop
-
thuban
forgive me for not preparing this as a pr to urls-sources; i'm not entirely certain about parameters, whether intervals should vary by category, etc
-
thuban
also, it isn't on the list but i'd like to recommend aljazeera.net--we currently get only english-language al jazeera
-
thuban
^ oh, probably needs a bit of cleanup too--a handful of 'sites' are just facebook/twitter
-
thuban
augh, i'm sorry, that was the wrong file
-
thuban
-
eggdrop
-
TheTechRobo
I have 4 million URLs from #burnthetwitch. Not sure how many have already been processed. Not sure if I should submit them now given the size of the queue so here they are:
transfer.archivete.am/O83qW/output.zst
-
TheTechRobo
("have already been processed" = run through in previous lists)
-
arkiver
hi all. i know this is still going up in number of URLs, but i want to let it play out for a little longer and see where we get to
-
arkiver
i think this all is largely a very big "initial bump" with the recent changes that went in
-
arkiver
thuban: nice! i'll put it in as is, but might later filter it a little more (perhaps taking the Twitter stuff out)
-
arkiver
TheTechRobo: we'll queue them later unless they're urgent
-
arkiver
thuban: they're in the repo now! they'll soon be queued as well
-
TheTechRobo
arkiver: Sounds good
-
datechnoman
arkiver - looks like the growing queue is starting to curve down now. We are gaining :D
-
datechnoman
"Jun for fun", ive been spot checking successful urls being archived from sources such as google scholar and the outlinks that we then process + pdf links onward etc. Turns out a lot of the content has never been archived so that is great we are filling in a lot of gaps for research and science etc :)
-
datechnoman
Even some of the government pdf's etc have never been archived in WBM
-
datechnoman
So we are doing some great work here team
-
fireonlive
:D
-
fireonlive
awesome sauce
-
datechnoman
Quality data that should be on the WBM / archive.org you know :D
-
fireonlive
hmmm
-
datechnoman
fireonlive - there is someone at work that says "awesome sauce" lol. Didnt think anyone else said it xD
-
fireonlive
xD
-
fireonlive
inb4 it's actually me >_>
-
fireonlive
:P
-
datechnoman
Well you are Aussie
-
datechnoman
So your much closer to me than everyone else
-
» kiska queues 100B urls for datechnoman
-
kiska
:p
-
datechnoman
kiska and here I was thinking you'd be spinning up all your Hetzner instances for fun
-
kiska
Nope!
-
kiska
I don't have money :D
-
fireonlive
datechnoman: am canadian :3
-
datechnoman
fireonlive - I swear you were Aussie lol. My bad! >.<
-
fireonlive
otherwise i'd be saying hello to lovely kiska at least once :p
-
fireonlive
haha all good
-
datechnoman
kiska - I wish I had money... I just spend it on AT lol
-
kiska
:D
-
fireonlive
i need more money too
-
fireonlive
uwu
-
fireonlive
-
fireonlive
idk where this came from obv
-
fireonlive
😳
-
datechnoman
Offt fair call. You win...
-
fireonlive
it's a sad contest haha
-
kiska
What do you mean "credit utilisation"
-
fireonlive
credit carried over month to month, not paid off
-
fireonlive
% of all credit used
-
kiska
Oh I see
-
fireonlive
credit bureaus apparently want that to be a max of 80% hm
-
thuban
arkiver: cool!
-
thuban
possibly silly question: where? i poked around the repo but the relevant commit only seems to have marked some items in other lists as duplicates. (forgotten `add`?)
-
thuban
also, i see that you have list-generation scripts in there as well--want the scraper i wrote?