-
fireonlive
lol wow
-
fireonlive
'how about you try opening the fuckin' thing?'
-
h2ibot
Nicolas17v2 edited Taringa! (+112, Add links to the DPoS project):
wiki.archiveteam.org/?diff=51902&oldid=51871
-
nicolas17
imer: did you independently confirm the zip is broken?
-
h2ibot
-
nulldata
Hmm svgs uploads aren't allowed on the wiki
-
nulldata
svg*
-
h2ibot
-
h2ibot
Nulldata edited Taringa! (+64, Added logo and screenshot):
wiki.archiveteam.org/?diff=51905&oldid=51902
-
fireonlive
yeah sadly not :(
-
fireonlive
i think JAA checked and it's not something they can change w/o editing the .php config
-
fireonlive
for mediawiki that is
-
HP_Archivist
I see that
vetusware.com was grabbed in 2022. Is it worth doing a recursive crawl?
-
HP_Archivist
-
HP_Archivist
And another, not sure if you've seen these previously
wiki.videolan.org
-
BornOn420
-
imer
nicolas17: yes, sent them the error message this time around
-
Hallfiry
I just remembered Tennessee Bill's Old Time Radio (a website I used to visit around 2010) and was wondering if you guys know it and maybe have gotten in touch with the author to back up his stuff. The site is long gone, bet maybe he's still around and has the data. It had hundreds of gigabytes of radio recordings, wartime posters etc.
-
Hallfiry
-
pabs
-
pabs
2010 era is a long time ago...
-
h2ibot
-
h2ibot
-
nicolas17
5 more files up in samsung-grab; I tried downloading one of them myself and it failed after 2 hours :/
-
nicolas17
-
nicolas17
(sheesh that's a lot of thelounge users)
-
k
lol
-
fireonlive
there's also like one request from hackint's matrix bridge too
-
fireonlive
<_<
-
myself
I don't _think_ my thelounge instance does that; I'm not seeing previews.
-
myself
But, this time the search result returned _two_ files: one 12-byte .txt, and one 1.5GB .zip, I presume we're to ignore the .txt?
-
TheTechRobo
I thought TL pings the servers even if link previews are disabled
-
TheTechRobo
Could be wrong
-
nicolas17
myself: yes
-
eightthree
is python (and maybe lua) fast enough for fetching? Would some projects download faster if a rust -grab or -items or -discovery tool be made? Or are most projects rate-limiting anyways and extra speed would be of much benefit?
-
nicolas17
eightthree: I have never seen CPU be the bottleneck
-
nicolas17
your bandwidth, website's bandwidth, website's limit of requests per IP, target capacity (when IA can't ingest the data fast enough), etc etc before you hit CPU
-
nicolas17
myself: I even had to put that item in the queue manually, because my enqueue.py script skips both "item has >1 files" and "item is not in Mobile Phone category" :P
-
TheTechRobo
nicolas17: CPU is often the bottleneck, but it's usually fixable
-
TheTechRobo
e.g. Telegram used to eat up a ton of CPU while requesting discussion data
-
nicolas17
yeah, CPU has been the bottleneck in some specific cases, but it was usually problems that can be solved without rewriting the entire stack in another language :P
-
nicolas17
such as parsing JSON in pure Lua, when more efficient Lua extensions already exist for that
-
JAA
The Python part is just for orchestration and not very relevant for performance.
-
JAA
Even if we wanted to replace Lua, it wouldn't be an easy thing to do since it'd likely mean replacing wget-at as well. But yeah, we have yet to run into a situation where Lua is the bottleneck, I believe.
-
nicolas17
another possible thing to try before rewriting the world would be luajit
-
JAA
I believe we've been using luajit for some time.
-
JAA
-
nicolas17
oh cool
-
h2ibot
Nicolas17v2 edited Taringa! (-1, Update status fields in project infobox):
wiki.archiveteam.org/?diff=51908&oldid=51905
-
TheTechRobo
oh, URLs is another project that's sometimes CPU-bound but that can be fixed without rewriting in another language
-
TheTechRobo
Even if the actual code can't be optimised further, some of the code can probably be made as a Lua extension
-
TheTechRobo
Or, worst-case scenario, some of the parsing could be done by a subprocess
-
imer
yeah, i wouldnt mind urls being less cpu hungry :D
-
imer
> load average: 419.08, 402.21, 371.56
-
nicolas17
myself: is that download still going or did it fail?
-
fireonlive
imer: those PDFs lol
-
h2ibot
JustAnotherArchivist edited Current Projects (+0, Move Taringa! to running):
wiki.archiveteam.org/?diff=51909&oldid=51869
-
h2ibot
Slukiceng edited Discourse (+68, /* Active Discourses */):
wiki.archiveteam.org/?diff=51910&oldid=51900
-
imer
fireonlive: no pdfs currently, well, at least not the big list
-
fireonlive
ahh
-
fireonlive
just ambient load :c
-
imer
wouldn't even know where to begin profiling though, nvm having it running in docker
-
imer
we talked about pattern count the other day in #// not sure if that's a significant contributor though
-
fireonlive
oh hmm we did just offload a bunch of stuff from the tracker eh
-
imer
loads been high before that though
-
imer
so not a sudden change
-
fireonlive
ah ok
-
fireonlive
i haven’t run urls for a little bit sadly
-
myself
nicolas17: oh it finished, I'm just bad without reminders :) comin' at ya!
-
JAA
!remind myself 5s We have reminders at home!
-
eggdrop
[remind] ok, i'll remind myself at 2024-03-19T23:09:43Z
-
eggdrop
[remind] myself: We have reminders at home!
-
fireonlive
!remindme 0s :3
-
eggdrop
[remind] ok, i'll remind you at 2024-03-19T23:31:30Z
-
eggdrop
[remind] fireonlive: :3
-
fireonlive
eggdrop: help