00:14:09 The wiki states for the docker "--concurrent 1: Process 1 item at a time per container. Although this varies for each project, the maximum recommended value is 5, and the maximum allowed value is 20. Leave this at 1, or check with us on IRC if you are unsure". I don't really understand this. Does it vary between projects or is it always worse to 00:14:09 have more or something? 00:16:08 If you set it too high, the target site might ban you. Ideally, that just means you no longer do meaningful work. In bad cases, it can pollute the archive and lead to missed data if we don't catch it. 00:16:51 It varies between projects 00:17:05 So the ideal value is whatever lets you grab as much as possible without getting banned. Rate limits vary by target site, so ideal concurrency varies by project. 00:20:37 Is there anywhere to check the recommended values, or is asking in their respective chats the most efficient? 00:21:14 We (often|usually|sometimes|occasionally) put it in the channel topic. 00:25:05 I see, thanks 00:27:12 With docker, is there any reason to not run multiple different projects at once? 00:27:56 Nope, many people do exactly that. :-) 00:29:38 (y) awesome. I (at least temporarily) want to maximize what I can do is all 00:31:19 Oh, as for when it comes to spore, I have found a tool that may assist with finding the items and thus not checking through tons of empty urls however I don't know any javascript. I'll just post the tool here if anyone wants to check it out: https://github.com/Spore-Community/SporeTools/blob/main/SporeDwrApiClient.ts 00:31:43 Basically, there are three types of people running workers: a) casual users that just set up the warrior in auto mode once and forget about it; b) powerusers that run multiple projects in parallel to get the most out of existing machinery; c) insane people with large clusters. 00:32:12 (Exceptions that don't fall into these categories confirm the rule.) 00:32:59 (I've no clue what that saying "exceptions confirm the rule" is supposed to mean but okay) 00:33:23 idk what a cluser is, is it someone getting like a "cloud" service to do it for them? 00:34:16 (also someone named idk just got tagged randomly lol) 00:35:09 'Exceptions confirm the rule' is a lighthearted way of saying there are counterexamples to a rule but they're few and far between, so the rule's still valid. 00:36:48 Cluster ~ significant amount of distributed computing resources. 'Cloud' stuff can be used for that, yeah, but it's not a requirement. 00:37:42 It also implies some degree of coordination across the resources, e.g. orchestration. 00:40:19 Not sure https://github.com/Spore-Community/SporeTools is of much use for reducing the 1.1 billion number. It's an implementation of Spore's API, but I don't see anything for checking many asset IDs at once. 00:59:40 omegle has shut down https://www.omegle.com/ (https://news.ycombinator.com/item?id=38199355) 01:01:10 Damn 01:02:49 yeah :c 01:09:47 JAA: What it would do is considerably lower the amount of empty URLs checked, I'd believe. I've been told that all assents use the same IDs but different formats so there will be many, many gaps. Considering the server's stability it may be safer to do something slower with a list in mind. However I've no clue how to get a list of items out of that 01:11:19 Pedrosso: If we make the same number of requests, it probably doesn't matter much. And API requests likely take more resources on the Spore server side than service static files. 01:11:39 s/service/serving/ 01:12:16 FireonLive edited Deathwatch (+307, add Omegle): https://wiki.archiveteam.org/?diff=51110&oldid=51097 01:13:00 I think I understand. I'll get to checking out the qwark software then. (Hopefully I can understand how to use it lol) 01:42:21 logs.omegle.com URLs collected from anywhere would be interesting. I already ran the ones from Reddit through AB. 02:23:30 I can't find any documentation for Qwarc, I hope I'm not expected to be big brained enough to figure it all out on my own 02:31:32 they seem to have one indexed (on google) redirect: http://waw1.omegle.com/redir/gj2016 02:31:55 to some youtube video? 02:33:09 log.omegle.com exists too and is indexed; though seems to serve the same content 02:34:58 NSFW: they seem to be running a whitelabel verison of chaturbate too: https://lady.omegle.com/ (though the fact it says 'Whitelabel powered by Chaturbate.com' isn't very whitelabel.. but lol) 02:35:15 i assume/but didn't check it's just chaturbate with a different logo 02:35:18 Pedrosso: I mentioned yesterday that there is no documentation. 02:35:57 fireonlive: Yeah, I noticed the same about lady.omegle.com. It's already running through AB to get a sample. 02:36:02 ah :) 02:36:34 About omegle, nice that you've got something to run on 02:36:37 noticed https://chatserv.omegle.com as well; which redirects to omegle.. but if you append a ?from= parameter you get omegle? 02:36:49 https://chatserv.omegle.com/?from=archive.org 02:37:01 indexed url was https://chatserv.omegle.com/?from=www.xiaodiaomao.com 02:37:15 Or even just an empty param works. 02:37:44 ah! :) 02:37:54 trying to use it just gives an error to reload though 02:38:58 antinudeservers: ["waw1.omegle.com", "waw2.omegle.com", "waw3.omegle.com", "waw4.omegle.com"] 02:38:59 haha 02:39:19 (in the start json response) 02:40:37 Pedrosso: I can't really recommend trying to use qwarc currently. It works, and it's very powerful, but the lack of documentation just make it a non-starter unless you enjoy reading through my code and figuring out all the quirks yourself. 02:42:22 it links to (NSFW) https://cameglelive.com/ (well a sub page of that) if you want an adult site instead, but not sure of the relation to omegle itself (and it doesn't seem to quickly say, it seems to be different than chaturbate) 02:42:38 i clicked men and now see a man jackin' it live so i guess it's unaffected for now 02:43:15 JAA: Every programmer is a masochist, hah. I don't know of any other resources than just the code that was mentioned, and the one I wrote sure is slow 02:46:01 ah ok, it's a whitelabel verison of https://www.streamate.com/ it seems 02:47:36 (omegle itself is Omegle.com LLC) 03:09:57 RIP omegle 03:14:37 * fireonlive pours one out 03:16:43 Petchea edited Tumblr (+430, /* History */): https://wiki.archiveteam.org/?diff=51111&oldid=51050 03:18:43 0KepOnline edited Spore (+633, Add tools (Spore PNG Downloader & sporeget) and…): https://wiki.archiveteam.org/?diff=51112&oldid=51106 03:26:45 Petchea edited Tumblr (-47, /* History */ reblog with more context): https://wiki.archiveteam.org/?diff=51113&oldid=51111 03:27:45 JustAnotherArchivist edited Deathwatch (+4, Fix order): https://wiki.archiveteam.org/?diff=51114&oldid=51110 03:28:45 am i... dumb? 03:29:44 i swear i stared at that for a minute 03:29:45 lol 04:50:14 Steve Wozniak hospitalized, possible stroke: https://www.cnbc.com/2023/11/08/apple-co-founder-hospitalized-in-mexico-due-to-possible-stroke-local-media-reports.html 04:50:37 ..and not the fun kind :( 06:57:00 https://x.com/dexerto/status/1722403634078457878?s=12 06:57:00 nitter: https://nitter.net/dexerto/status/1722403634078457878 08:16:45 Omegle has some interesting subdomains like lady.omegle.com that you should not open at work like i did 08:35:33 LMAOOOO 15:48:33 that_lurker: Yeah, discussed above, it's a whitelabel version of Chaturbate. 16:00:57 tumblr reportedly going to skeleton crew status: https://spaceoperajay.tumblr.com/post/733460173913489408/screenshot-20231006-005401-2-1-hosted-at-imgbb 16:02:36 (best source i could find, sorry) 16:20:35 Megame edited Deathwatch (+167, /* 2023 */ https://apo.org.au/ - 15 Dec): https://wiki.archiveteam.org/?diff=51115&oldid=51114 16:25:02 jezebel shutting down: https://variety.com/2023/digital/news/jezebel-shutting-down-go-media-layoffs-1235785877/ 16:30:38 no on-site announcement yet 16:30:48 although a lot of pages depend on js (even the articles, which have 'continue reading' buttons), i _think_ that the relevant data is in-source and playback from archivebot captures would work 16:37:35 (except comments, which are an ugly-looking in-house js thing) 19:09:07 better source for tumblr: https://arstechnica.com/gadgets/2023/11/tumblr-is-reportedly-on-life-support-as-its-latest-owner-reassigns-staff/ 19:28:18 Manu edited Political parties/Germany/Hamburg (+4138): https://wiki.archiveteam.org/?diff=51116&oldid=51108 20:36:18 Would https://sporepedia2.foroactivo.com/ be considered small enough a forum for ArchiveBot to deal with? If so, I've got a small list of similar forums that don't seem to have been archived 20:51:27 we got a filesystem dump of the preinstalled macOS 14.1 on M3 Pro but still looking for 13.5 on M3 ._. 20:51:34 gonna get increasingly hard to find people who didn't update yet 20:57:08 nicolas17: jason seems to be missing from here but maybe you could tweet at him and he might boost it? or email uh whatever fucking cutsey email he left here 20:57:21 ended in textfiles.com i think 20:57:42 jesuschristmorearchiveteamcrap⊙tc 20:58:11 he does have that big follower base ™ 21:00:37 JAABot edited CurrentWarriorProject (+4): https://wiki.archiveteam.org/?diff=51117&oldid=51109 21:16:30 Jason is here, but he doesn't check IRC all that often. 21:29:55 Tumblr is reportedly on life support as its latest owner reassigns staff: https://arstechnica.com/gadgets/2023/11/tumblr-is-reportedly-on-life-support-as-its-latest-owner-reassigns-staff/ https://news.ycombinator.com/item?id=38209312 21:30:28 sighs time to grab more tumblr 21:31:44 → #tumbledown 21:49:20 Would anyone mind answering the question I sent above? They allegedly say they have 456,901 messanges 21:51:46 Pedrosso: That would be appropriate for AB, yeah. 22:02:08 (y) great 22:13:42 JAA: oh maybe under another nick; i tried the two i remember 22:13:48 didn't check hostnames though :) 22:14:09 but yeah true i do remember him saying just use email 22:16:01 fireonlive: He's S.ketchCow here normally, currently S.ketchCo1 due to a netsplit or similar. 22:29:26 oh! 23:10:26 I noticed in the ArchiveBot for sporepedia2 that https://i.imgur.com/YHS5Omo.png returned a 429 code. Do those links get sent to #// or #imgone Or are they just logged somewhere? 23:11:38 They're just logged - it's possible to extract them from the meta-warc (which is a gz-compressed log file) and throw them into #imgone manually but nothing automatically happens currently 23:26:04 Considering the nature of those images (them being spore creations) that may be a good thing to do 23:59:49 he-man.org forums shutting down on November 14th https://www.he-man.org/forums/boards/showthread.php?285395-The-He-Man-Org-forums-will-close-on-Tuesday-November-14-2023