00:00:07 JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=51953&oldid=51930 00:20:10 Pokechu22 edited Deathwatch (+520, /* 2024 */ monster hunter now forum also closing): https://wiki.archiveteam.org/?diff=51954&oldid=51952 00:32:55 > they seem to detect us based on TLS fingerprinting? 00:33:48 what does that mean and is there an effort to get around it? (i know what tls means but i am unfamiliar with tls fingerprinting) 00:36:59 a solution is in the works as far as I know, yes 00:38:03 icedice: i will look at scraping mangaupdates and mangadex for release group sites and feeding the blogspot urls to #frogger, thanks for the suggestion. i cannot access the vatoto groups page; does it require login? 00:38:04 "what does that mean" different TLS (the thing used to encrypt https) implementations act slightly differently, so you can "fingerprint" specific ones (like major browsers) 00:38:05 TLS fingerprinting is identifying something based on how it uses TLS e.g. which ciphers it supports. If you support this cipher but not that cipher, you must be a terrorist. 00:38:14 (nb: mangaupdates' group list seems to go only to page 100 and cut off in the middle of 'P'. i can use the 'by letter' pages to get _most_ of the rest, but their collation order seems to put non-ascii characters at the end so there's no way to find eg https://www.mangaupdates.com/group/ago2peh/al-yans-kustarnikov except by brute-forcing or search. bad) 00:38:49 and if you see an unknown fingerprint do lots of requests, and know its not a browser, you block it 00:38:56 you're saying all someone has to do is run the archiver scripts through Tor and reddit will block tor access 00:39:17 does tor actually work? 00:39:27 well no, reddit would still see the origin client's TLS handshake 00:39:29 reddit allows you to read it through tor 00:41:00 but you're saying if someone read it a lot with the wrong fingerprint, they could be made to automatically ban all tor users 00:42:41 How long are these bans? Are they just minute/hour level throttling, or do they last longer? 00:46:01 → #shreddit 01:09:33 (mangadex also limits its group pagination :/ max 10000 results, search enabled on group names only. they seem cool so we might be able to get a complete list (of sites/of blogspot sites) if we ask nicely, but i would have to get on... discord...) 01:13:23 Blankie edited Fandom (-1, /* Download */ Fix link to more information…): https://wiki.archiveteam.org/?diff=51955&oldid=49560 01:13:24 IDKhowToEdit edited Deathwatch (+301, Add marketplace comment deprecation for roblox): https://wiki.archiveteam.org/?diff=51956&oldid=51954 01:13:25 Dango360 edited Roblox (+7342, added roblox comments removal section): https://wiki.archiveteam.org/?diff=51957&oldid=49854 01:13:26 IDKhowToEdit edited Roblox (+384, Added marketplace comment removal): https://wiki.archiveteam.org/?diff=51958&oldid=51957 01:15:24 JustAnotherArchivist edited Roblox (-369, Remove duplicate content, datetimeify): https://wiki.archiveteam.org/?diff=51959&oldid=51958 02:28:18 https://news.ycombinator.com/item?id=39852219 < is openai going to get mad about this and lock things down lol 02:30:10 TIL /raw/ on Discourse 02:34:49 > Raw data was gathered into a single JSONL file by automating a browser using Playwright. 02:34:57 Running a full browser to fetch some JSON... 02:36:33 is an effective way to bypass any check that is looking for non-approved browsers 02:37:18 Obviously, but as far as I can see, there isn't such a check here. 02:37:44 Or at least not one that would excessively limit the retrieval rate. 03:34:17 it's also an effective way to run all the arcane bloated SPA JS code to fetch the data for you 04:24:54 Petchea edited Bilibili (+128): https://wiki.archiveteam.org/?diff=51960&oldid=51859 05:48:53 icedice: i will look at scraping mangaupdates and mangadex for release group sites and feeding the blogspot urls to #frogger, thanks for the suggestion. i cannot access the vatoto groups page; does it require login? 05:49:03 Vatoto works for me 05:49:17 It has groups under letter categories 05:49:22 (mangadex also limits its group pagination :/ max 10000 results, search enabled on group names only. they seem cool so we might be able to get a complete list (of sites/of blogspot sites) if we ask nicely, but i would have to get on... discord...) 05:49:38 I've chatted with MangaDex staff in the past 05:49:54 I can handle it if you want 05:51:04 icedice: that sounds good, thank you! 05:51:30 No problem 05:59:25 thuban> (nb: mangaupdates' group list seems to go only to page 100 and cut off in the middle of 'P'. i can use the 'by letter' pages to get _most_ of the rest, but their collation order seems to put non-ascii characters at the end so there's no way to find eg https://www.mangaupdates.com/group/ago2peh/al-yans-kustarnikov except by brute-forcing or search. bad) 05:59:46 Mangaupdates has an IRC channel at #baka-updates⊙iin 06:00:18 They handed over Imgur links from their forums to me in the past 06:01:00 However, iirc they ignored me for probably like a week at least until I poked them again and they went "here's the list, now piss off" 06:01:06 Or something along those lines 06:01:19 I think that was them, at least 06:01:31 hmmm 06:04:40 i did use search to do some spot-checking with other cyrillic characters and didn't find any results other than that group, and with cjk and didn't find anything i hadn't already seen in 'all', so paging by letter is probably Good Enough™? 06:05:29 i'm much less confident in saying that about cjk/other character sets than about cyrillic, but 06:05:59 good place to start unless/until one of us talks to them about it 14:48:28 Happened upon https://narkive.com/ - doesn't look like it's been crawled in length previously 15:24:41 how to use itunes content and how to search on specific topic 17:12:55 can anyone tell me whether it's a good idea to grab a mailman instance using AB? the wiki page mentions a few tools, but doesn't say anything about AB 18:55:52 c3manu: pretty sure most of them have been done via AB? 19:41:49 pokechu22: idk, that's why i'm asking ^^ 20:14:33 c3manu: yes, people have been doing it with archivebot (https://hackint.logs.kiska.pw/archiveteam-bs/20230616#c352608). from what i've heard mailman 2 and mailman 3 both work ok (https://hackint.logs.kiska.pw/archiveteam-bs/20230621#c353873) 20:14:35 https://wiki.archiveteam.org/index.php/Mailman/2 has ab tips for 2 20:33:23 thuban: oh nice, thanks. i indeed do have a 2.19 here 20:33:35 eeh 2.1.29 20:46:41 Manu edited Mailman/2 (+26, add https://lists.metalab.at/ to archived list): https://wiki.archiveteam.org/?diff=51963&oldid=51875