00:00:00 And I kept the DBs so I could throw the outlinks into other projects. 00:00:43 want to do that again? or should I just AB 00:01:20 might also be worth googling for the software those use, might be additional instances 00:03:00 I'm trying to get other things done, so I'd prefer not. I.e. just AB with outlinks enabled. Relevant URLs could always be extracted from the log and be fed to the appropriate other project for a more thorough coverage. 00:03:31 (The ones I ran were with --no-offsite-links.) 00:33:05 ok, will do them today 00:40:00 Hello 00:40:47 Hello again 00:45:50 hello anonlmao, what can we do for you? 00:49:41 Yeah, I've been sniffing through Google about a build of Kubuntu 14.10 00:50:04 For the uninitiated, Kubuntu released 2 versions of 14.10 00:51:12 One is the stable KDE 4 release 00:51:26 The other is the KDE 5 tech preview 00:51:52 Now, the KDE Plasma 4 version is available 00:52:11 But the KDE Plasma 5 one seems is lost 00:52:36 I've been digging through Google for mirrors 00:52:51 Old FTP mirrors that could potentially had it 00:53:18 Filename: 00:53:21 kubuntu-plasma5-14.10-desktop-amd64.iso 01:09:10 anonlmao: https://archive.org/download/kubuntu-15.04-desktop-amd64 has "kubuntu-14.10-desktop-amd64.iso" 01:10:41 I'm finding the one with **plasma-5** on it 01:13:24 It's not in any of the 446 IA items surfaced by a search for 'kubuntu'. More accurately, no files with names containing 'plasma5' in any of those. 01:15:04 I'm also finding it through ftp archives 01:15:41 That would be confusing and also, 01:15:59 idk if the guys behind kubuntu still have them 01:16:05 offline 01:23:03 anonlmao: might be worth asking on their channel 01:52:56 What do i miss? 01:55:03 So yeah about that 02:18:20 nicolas17: (reading back a bit) I wonder if DALLE-2 or Stable Diffusion would work for Instagram selfie verification 02:19:23 Also Instagram posts were accessible from Threads yesterday but this has since been patched https://news.ycombinator.com/item?id=36621190 02:41:16 Yeah. I'm back since web cache's a b 02:48:54 "Also Instagram posts were accessible from Threads yesterday but this has since been patched" that explains how there hasn't been any teething issues with scaling.. if they're just using the already working infra 02:49:21 it's very similar to meta yes 02:49:34 the requests they make too 02:49:44 pretty sure it's all on the same infrastructure 06:43:32 11:41:40 PM <+rss> Google to explore alternatives to robots.txt: https://blog.google/technology/ai/ai-web-publisher-controls-sign-up/ → https://news.ycombinator.com/item?id=36641607 06:43:53 it's nothing more than a gross mailing list sign up now: https://services.google.com/fb/forms/ai-web-publisher-controls-external/ but watch this space i guess o_O 06:54:02 PaulWise edited IRC (-163, move the list of IRC log servers to a new…): https://wiki.archiveteam.org/?diff=50137&oldid=44174 07:14:05 PaulWise created IRC/Logs (+7695, create initial IRC logs project, mostly using…): https://wiki.archiveteam.org/?title=IRC/Logs 07:14:32 "mostly using log sites scraped from OFTC/libera/hackint/gimpnet channel topics" 07:14:36 JAA: ^ 07:33:09 PaulWise edited IRC/Logs (+0, harmattan logs in progress, discovered by…): https://wiki.archiveteam.org/?diff=50139&oldid=50138 07:39:10 PaulWise edited IRC/Logs (+0, the logbot archive was done in 2021 after it…): https://wiki.archiveteam.org/?diff=50140&oldid=50139 08:00:13 FireonLive edited IRC/Logs (+235, add some found logs): https://wiki.archiveteam.org/?diff=50141&oldid=50140 08:02:02 fireonlive: ^ the raku one of those is a dupe 08:02:30 whoops 08:03:14 FireonLive edited IRC/Logs (-25, remove dupe): https://wiki.archiveteam.org/?diff=50142&oldid=50141 16:13:49 pabs: Nice, thanks! 16:59:33 do we have some examples of discourse forums? 16:59:41 do any 'weird' or extremely customized variants exist? 17:00:42 Only a wiki page full of them: https://wiki.archiveteam.org/index.php/Discourse 17:01:03 wooh awesome! 17:01:30 how popular is discourse compared to other forum software? is it more of a certain type of people on there? 17:01:48 I think it's more about recency 17:01:53 Oh, GitHub Community shut down last year. 17:02:08 forums started in the past 10 years are more likely to be discourse (or xenforo I guess) 17:02:44 i plan on using #// to discover all discourse forums we come across, and then get the continuously archived (new posts/comments) with a project 17:02:55 the posts have sequential IDs as far as i can see, so that can be nicely used 17:03:08 sounds great, I would love to help adapt it to other forum software 17:03:26 we will! 17:03:40 That'd be lovely. We really need to make better use of the data flowing through #//. :-) 17:03:52 JAA: indeed! we'll start extracting various interesting things 17:03:53 but 17:04:34 without actually reading the HTML in Lua, since that is pretty expensive. so we'll use the URLs Wget-AT itself extracts to determine if the webpage is likely from a discourse forum 17:04:47 Ah 17:05:29 Are we not doing URL extraction in Lua there then? 17:05:37 barely in #// 17:05:41 I see. 17:05:44 it's expensive 17:05:47 Yeah 17:06:14 but the recent more sophisticated spam/loop blocking also uses this new method or checking URLs extracted by Wget-AT to determine if something is likely spam 17:06:21 i think this will work well 17:07:15 if something seems to be discourse, we can actually read it do determine it for sure and then queue the discourse forum to a special project for further processing 17:07:49 Sounds good. 17:12:28 btw we have #msgbored 17:53:39 imgur murdered their discourse based (iirc?) one as well a couple years ago 17:53:47 oop 19:51:44 If you need discourse forums, I think there is internals.rust-lang.org and users.rust-lang.org. 23:41:31 albertlarsan68: please add those to the wiki page https://wiki.archiveteam.org/index.php/Discourse