00:06:27 lol wow 00:06:52 'how about you try opening the fuckin' thing?' 02:14:36 Nicolas17v2 edited Taringa! (+112, Add links to the DPoS project): https://wiki.archiveteam.org/?diff=51902&oldid=51871 02:17:28 imer: did you independently confirm the zip is broken? 02:57:42 Nulldata uploaded File:Taringa-FrontPage.png: https://wiki.archiveteam.org/?title=File%3ATaringa-FrontPage.png 02:58:02 Hmm svgs uploads aren't allowed on the wiki 02:58:13 svg* 03:02:43 Nulldata uploaded File:Taringa-logo-500px.png: https://wiki.archiveteam.org/?title=File%3ATaringa-logo-500px.png 03:04:43 Nulldata edited Taringa! (+64, Added logo and screenshot): https://wiki.archiveteam.org/?diff=51905&oldid=51902 03:21:45 yeah sadly not :( 03:22:05 i think JAA checked and it's not something they can change w/o editing the .php config 03:22:52 for mediawiki that is 04:09:21 I see that https://vetusware.com/ was grabbed in 2022. Is it worth doing a recursive crawl? 04:09:45 fireonlive: Wiki - https://wiki.panotools.org/Main_Page 04:12:16 And another, not sure if you've seen these previously https://wiki.videolan.org/ 06:54:06 https://tracker.archiveteam.org/ is sleeping 08:46:13 nicolas17: yes, sent them the error message this time around 09:42:16 I just remembered Tennessee Bill's Old Time Radio (a website I used to visit around 2010) and was wondering if you guys know it and maybe have gotten in touch with the author to back up his stuff. The site is long gone, bet maybe he's still around and has the data. It had hundreds of gigabytes of radio recordings, wartime posters etc. 09:42:16 https://web.archive.org/web/20101129153101/http://tennesseebillsotr.com 09:58:17 looks like we didn't grab it https://archive.fart.website/archivebot/viewer/?q=tennesseebillsotr.com 10:01:56 2010 era is a long time ago... 13:34:49 Arkiver uploaded File:Taringa icon.webp: https://wiki.archiveteam.org/?title=File%3ATaringa%20icon.webp 13:37:49 Arkiver uploaded File:Taringa icon.png: https://wiki.archiveteam.org/?title=File%3ATaringa%20icon.png 20:17:57 5 more files up in samsung-grab; I tried downloading one of them myself and it failed after 2 hours :/ 20:19:06 https://data.nicolas17.xyz/samsung-grab/ 20:19:21 (sheesh that's a lot of thelounge users) 20:20:56 lol 20:28:53 there's also like one request from hackint's matrix bridge too 20:28:53 <_< 20:33:00 I don't _think_ my thelounge instance does that; I'm not seeing previews. 20:33:28 But, this time the search result returned _two_ files: one 12-byte .txt, and one 1.5GB .zip, I presume we're to ignore the .txt? 20:37:48 I thought TL pings the servers even if link previews are disabled 20:37:50 Could be wrong 20:58:20 myself: yes 20:58:41 is python (and maybe lua) fast enough for fetching? Would some projects download faster if a rust -grab or -items or -discovery tool be made? Or are most projects rate-limiting anyways and extra speed would be of much benefit? 20:58:55 eightthree: I have never seen CPU be the bottleneck 21:00:48 your bandwidth, website's bandwidth, website's limit of requests per IP, target capacity (when IA can't ingest the data fast enough), etc etc before you hit CPU 21:03:12 myself: I even had to put that item in the queue manually, because my enqueue.py script skips both "item has >1 files" and "item is not in Mobile Phone category" :P 21:42:22 nicolas17: CPU is often the bottleneck, but it's usually fixable 21:42:52 e.g. Telegram used to eat up a ton of CPU while requesting discussion data 21:42:58 yeah, CPU has been the bottleneck in some specific cases, but it was usually problems that can be solved without rewriting the entire stack in another language :P 21:43:55 such as parsing JSON in pure Lua, when more efficient Lua extensions already exist for that 21:44:19 The Python part is just for orchestration and not very relevant for performance. 21:46:50 Even if we wanted to replace Lua, it wouldn't be an easy thing to do since it'd likely mean replacing wget-at as well. But yeah, we have yet to run into a situation where Lua is the bottleneck, I believe. 21:48:28 another possible thing to try before rewriting the world would be luajit 21:52:17 I believe we've been using luajit for some time. 21:53:22 Yes: https://github.com/ArchiveTeam/wget-lua/commit/da43582bfda92c9f5848f7b1fc15edf78d9e1b41 21:55:03 oh cool 22:04:25 Nicolas17v2 edited Taringa! (-1, Update status fields in project infobox): https://wiki.archiveteam.org/?diff=51908&oldid=51905 22:20:26 oh, URLs is another project that's sometimes CPU-bound but that can be fixed without rewriting in another language 22:20:44 Even if the actual code can't be optimised further, some of the code can probably be made as a Lua extension 22:20:58 Or, worst-case scenario, some of the parsing could be done by a subprocess 22:28:27 yeah, i wouldnt mind urls being less cpu hungry :D 22:28:40 > load average: 419.08, 402.21, 371.56 22:29:08 myself: is that download still going or did it fail? 22:30:21 imer: those PDFs lol 22:30:30 JustAnotherArchivist edited Current Projects (+0, Move Taringa! to running): https://wiki.archiveteam.org/?diff=51909&oldid=51869 22:30:31 Slukiceng edited Discourse (+68, /* Active Discourses */): https://wiki.archiveteam.org/?diff=51910&oldid=51900 22:30:46 fireonlive: no pdfs currently, well, at least not the big list 22:30:58 ahh 22:31:38 just ambient load :c 22:31:41 wouldn't even know where to begin profiling though, nvm having it running in docker 22:32:01 we talked about pattern count the other day in #// not sure if that's a significant contributor though 22:37:17 oh hmm we did just offload a bunch of stuff from the tracker eh 22:37:38 loads been high before that though 22:37:43 so not a sudden change 22:38:05 ah ok 22:38:14 i haven’t run urls for a little bit sadly 23:03:24 nicolas17: oh it finished, I'm just bad without reminders :) comin' at ya! 23:09:38 !remind myself 5s We have reminders at home! 23:09:39 -eggdrop- [remind] ok, i'll remind myself at 2024-03-19T23:09:43Z 23:09:43 [remind] myself: We have reminders at home! 23:31:30 !remindme 0s :3 23:31:31 -eggdrop- [remind] ok, i'll remind you at 2024-03-19T23:31:30Z 23:31:32 [remind] fireonlive: :3 23:32:06 eggdrop: help