00:21:30 ok I whipped up a script to do that now 00:21:40 I'm definitely getting a speedup but not *that* good 00:22:05 went through 24GiB of the tar file in 6 minutes 00:22:38 which would be 71MiB/s if I was downloading the full thing, which seems impossible to get from archive.org :p 00:22:52 but... it could be better 00:35:17 there we go, I almost halved the number of requests needed, 100MiB/s equivalent now :D 00:43:31 "exception: connection aborted" nooo, time to add retries 01:30:39 I should have done this script earlier lol 02:07:21 And now my program downloads 256kb at once and caches it, is this officially a race now or :P 02:08:04 I'm benchmarking :o 02:17:59 I tried readahead of 128KB, 256KB, and 512KB, the speed difference was completely lost in the noise 02:28:18 > Cult of the Lamb dev says it will delete the game on January 1 02:28:28 ... due to the Unity changes 02:28:31 :\ 02:28:34 https://www.pcgamesn.com/cult-of-the-lamb/deleted 02:30:24 I was going to suggest 'discord' if we wanted to create a channel, but... 02:31:10 * fireonlive blinks 02:31:58 hm they posted this after? https://twitter.com/cultofthelamb/status/1702091821273461176 02:31:59 nitter: https://nitter.net/cultofthelamb/status/1702091821273461176 02:33:09 the article's source was also two posts from their shitpost-social-media-account so ¯\_(ツ)_/¯ 02:34:31 Ah indeed :-) 02:41:07 IA download speeds are way too variable to test this properly 02:41:30 we need to get nicolas17 a 10Gig interconnect to IA 02:41:31 suddenly dropped to 800KiB/s *despite* skipping chunks 02:43:16 I think optane9 has a better part of 10g to IA, it's on a network with peering to IA at the SFMIX. Maybe run it there? :D 03:48:25 this is all over the place... 03:48:46 64KB: 32s 35s 43s 03:48:53 256KB: 11s 17s 18s 03:49:02 1024KB: 14s 25s 50s 04:17:22 PaulWise created MoinMoin (+2087, create MoinMoin project page): https://wiki.archiveteam.org/?title=MoinMoin 04:38:26 PaulWise edited MoinMoin (+5058, add more moinmoin wikis from google/bing): https://wiki.archiveteam.org/?diff=50769&oldid=50768 04:41:27 PaulWise edited MoinMoin (+145, another strategem): https://wiki.archiveteam.org/?diff=50770&oldid=50769 04:51:28 PaulWise edited MoinMoin (+3477, more, sorted): https://wiki.archiveteam.org/?diff=50771&oldid=50770 05:11:08 43.1GiB tar file indexed in 3m22s :D 05:13:31 how the hell 05:14:06 I had others, especially those with few videos and mostly html pages, taking longer than just downloading the entire tar 05:15:01 so it depends on the tar content *and* on the speed of the particular IA server I hit 05:15:13 especially latency more than throughput... 05:15:33 PaulWise edited MoinMoin (+5010, more, sorted): https://wiki.archiveteam.org/?diff=50772&oldid=50771 05:15:57 just finished a big one, 255GiB in 49m30s 05:16:19 I have another of a similar size with an ETA of 4 hours -.- 05:34:03 anyone got any scripts/something to automate (browser-based?) searching using Bing? 05:38:31 flashfire42? 05:38:38 or do you manually rawdog that 05:40:01 phrasing 05:40:55 :3 05:57:23 indexing 8 tar files at the same time, to do them at this speed while downloading the whole .tar I would need to download from IA at a total speed of 433 MB/s >:3 06:20:32 https://livingcomputers.org/Closure.aspx 06:22:01 pabs: it seems they closed in 2020, but it sucks that the announcement doesn't have a date 06:22:48 website got an AB in 2020, no subdomains though, inc the wiki 06:38:32 started some jobs 07:08:06 literally how 07:08:47 my program is taking 20 seconds just to get 5mb into the file with 256 or 512 kbps downloaded at a time and its written in c ._. 07:09:33 and thats also with downloading html files turned off 07:33:16 taaffeite: What kind of errors were you getting? 07:33:51 Is it running at all or is it just a problem with the page you're trying to download 07:41:06 I'm receiving several warnings and errors: EBADENGINE is an unsupported engine, npm ERR! path /usr/local/lib/node_modules/mwoffliner/node_modules/sharp command failed, Installation error: Expected Node.js version >=14.15.0 but found 12.22.9. 07:41:34 So an outdated Node.js version? 07:43:35 Yeah that's likely the issue 07:43:47 How did you install it? 07:44:20 Usually the repo in your distribution is outdated 07:46:20 I followed the instructions on the GitHub page. Using the latest version of Linux Mint. 'npm i -g mwoffliner' 07:46:29 I mean how did you install Node? 07:50:10 Perhaps I didn't actually. I just installed the redis-server. 07:50:25 I see 07:50:45 I downloaded the Node.js binary from their site, but couldn't install that. 07:51:47 Well you should try to install Node then 07:51:49 sudo apt install nodejs 07:52:05 Then run nodejs -v and see what version it gave you 07:53:08 'nodejs is already the newest version (12.22.9~dfsg-1ubuntu3)' 07:53:43 The apt package is out of date? 07:53:59 Oh okay, yes 07:54:49 I haven't tried this myself but apparently there is a Node package that will update it for you 07:55:14 npm install -g n 07:55:31 Then run: 07:55:33 n stable 07:55:37 And it should update 07:56:26 If that doesn't work you'll have to reinstall with a newer version 07:57:13 Okay that worked. It's now v18.17.1. I'll try running mwoffliner again. 07:57:54 Nice 07:58:07 Give it a try 08:01:22 A wall of deprecation and unmaintained warnings, then 'added 1191 packages in 2m', '117 packages are looking for funding'. 08:01:40 That's normal for npm :P 08:02:20 If you just see that when installing packages then it's likely fine 08:02:43 I'm getting a help page from mwoffliner so I think we're good. 08:02:57 Sweet 08:03:45 I was told by someone working on the ZIM project that this script is undergoing maintenance and might not work until it's been repaired sometime in the next several months. But I'll give it a go. Thanks for the help. 08:04:41 You're welcome :) 09:09:09 Hmm anyone thought about archiving parts of the Unity forums? This is 128 pages of comments to the new pricing changes that should probable be saved (In case they delete it like they did their github) https://forum.unity.com/threads/unity-plan-pricing-and-packaging-updates.1482750/page-128 10:26:38 PaulWise edited Mailman2 (+1125, add new lists, move not done lists to the right…): https://wiki.archiveteam.org/?diff=50773&oldid=50767 12:34:02 https://investors.unity.com/news/news-details/2022/Unity-Announces-Merger-Agreement-with-ironSource/default.aspx 13:26:51 pabs: little-things/bing-scrape, though I haven't used it in some time. 15:29:44 joepie91|m and others regarding interest on Unity, need to vocariously start finding Unity stuff :C 17:03:20 You guys are going to start archiving Unity games? 17:05:15 Exorcism: WordPress works well with simple recursive crawling like AB. Are you proposing a large-scale project? 17:07:30 JAA: let's say yzqzss launched its own project https://github.com/saveweb/wordpress-rss-archiver then I don't know if you really want to, that's why I'm asking 👀 17:09:14 The README sounds like this is a 'run this to continuously submit new posts to SPN' thing...? 17:09:34 Isn't there also some kind of wordpress push notification system that's tied into IA? 17:10:05 related to https://developer.wordpress.com/docs/firehose/ (though I think it includes stuff not hosted by wordpress.com)? 17:10:25 The README claims you have to pay for that. 17:10:58 Right, but I think IA does? 17:11:23 https://archive.org/details/NO404-WP?tab=about 17:12:03 (not to be confused with https://archive.org/details/NO404-WKP?tab=about) 17:15:06 Ah nice, I've long wanted to look into leveraging Jetpack for that. 17:15:07 "The README sounds like this is a..." <- yep, that's it :p 17:17:37 You might want to add https://mangadex.com/ to the list if you're going to be archiving WordPress sites 17:18:16 It's a scanlation group site hosting service run by MangaDex and it uses WordPress 17:20:08 (mangadex.org is the domain used by MangaDex's manga reader) 17:26:48 👌🏻 17:28:33 A continuous thing for select blogs would fit into #//. Duplicating the IA's project, i.e. doing that for all blogs with Jetpack, probably makes little sense, assuming they're achieving decent coverage there. 17:29:14 And as mentioned, one-off archival works very well with AB. 17:35:58 Oh, also, arkiver - what data would you need for a DPoS project for orange? I can build a list of pages that are known to exist (e.g. website front pages, possibly deeper ones too) based on the AB jobs, but I'm not sure what else is needed 18:55:45 pokechu22: we need all the links you know about 18:56:15 Exorcism: what is this about? 18:56:48 I've got 2GB of assorted links (some dead but existed in the past via CDX data, some alive, some already saved via AB); I can try to organize that into something actually usable 18:57:38 one other thing is that there are several kinds of links that will need to be remapped into other links because sites link to older domains that no longer work, but I imagine that's pretty easy to do with a script 18:58:18 pokechu22: can you gz or zst the list up and post it? 18:58:22 on transfer.archivete.am 18:58:31 let's do a channel for orange! 18:58:45 any ideas for an orange channel? :) 18:59:09 #webroasting already exists, not sure if we need a dedicated one 18:59:19 ah 18:59:22 alright we'll use that 19:00:07 Exorcism: for wordpress, could you just use #archivebot , and for regularly getting a set of wordpress RSS feeds we could (as JAA suggests) just use #// indeed 19:00:41 I'm not in favor of Archive Team using SPN on a large scale, SPN is not made for that 19:01:15 honestly behind the scenes, SPN is quite busy and regularly has too much to do, so queuing complete wordpress blogs through is maybe not the best way 19:01:27 (plus indeed IA already does something with wordpress) 19:08:06 archivebot best bot :) 19:08:38 :) 19:14:19 GroupNebula563 uploaded File:Outdated-warrior-error.png: https://wiki.archiveteam.org/?title=File%3AOutdated-warrior-error.png 19:15:19 Arkiver edited CNET Forums (-33, Reverted edits by…): https://wiki.archiveteam.org/?diff=50776&oldid=50445 19:15:30 👍🏻 19:16:25 i reverted a sneaky spammy edit ^ 19:16:43 huh, odd 19:16:51 should block that user I suppose 19:17:07 yeah i marked them as spammer 19:17:14 ah :) 19:17:18 they tried to get in another edit (it was in the mod queue) 19:17:23 ahh 19:17:30 gotta love spammers... 19:17:38 it was in the CNET announcement, which went like "blabla... Thanks, CNET team" 19:17:51 and they added "BLABLA... Thanks [spam link], CNET team" 19:17:54 sneaky 19:17:55 :P 19:17:59 indeed :3 19:18:02 they were caught thouhg 19:19:15 Exorcism: or do you have different thoughts about that? 19:23:38 plcp - you should probably join #webroasting 19:26:01 (logs at https://irclogs.archivete.am/webroasting/2023-09-14 for recent stuff) 19:29:02 "Exorcism: or do you have..." <- not really, I just prefer to use wordpress archiver, that's it haha 19:30:41 right 22:30:02 https://torrentfreak.com/ace-takes-aim-at-zoro-to-successor-aniwatch-to-230912/ 22:30:33 "ACE Takes Aim at Zoro.to Successor Aniwatch.to" "Below is a list of all domains targeted by MPA/ACE in a recent DMCA subpoena wave" 23:19:30 Moved from #archiveteam-ot: Any idea how feasible is to archive rateyourmusic.com considering that they seems to block Wayback machine IPs, probably because of the amount of traffic. They are great place for music discovery, and their forum is around 20 years old. Most of the pages are unarchived, and many of those that are just display block notice because of unusual activity (ex. https://web.archive.org/web/20230909224447/https://rateyourmusic.com 23:19:31 /~Fooftilly). Their image CDN isn't blocked though. 23:32:32 Well with 2 new trackers coming up I may switch to AT Choice when I head to work today 23:35:14 Which new ones are coming up? 23:41:04 Peroniko: #zowch and one under #webroasting for orange