-
h2ibot
-
h2ibot
Pokechu22 edited Deathwatch (+520, /* 2024 */ monster hunter now forum also closing):
wiki.archiveteam.org/?diff=51954&oldid=51952
-
archivst
> they seem to detect us based on TLS fingerprinting?
-
archivst
what does that mean and is there an effort to get around it? (i know what tls means but i am unfamiliar with tls fingerprinting)
-
imer
a solution is in the works as far as I know, yes
-
thuban
icedice: i will look at scraping mangaupdates and mangadex for release group sites and feeding the blogspot urls to #frogger, thanks for the suggestion. i cannot access the vatoto groups page; does it require login?
-
imer
"what does that mean" different TLS (the thing used to encrypt https) implementations act slightly differently, so you can "fingerprint" specific ones (like major browsers)
-
immibis
TLS fingerprinting is identifying something based on how it uses TLS e.g. which ciphers it supports. If you support this cipher but not that cipher, you must be a terrorist.
-
thuban
(nb: mangaupdates' group list seems to go only to page 100 and cut off in the middle of 'P'. i can use the 'by letter' pages to get _most_ of the rest, but their collation order seems to put non-ascii characters at the end so there's no way to find eg
mangaupdates.com/group/ago2peh/al-yans-kustarnikov except by brute-forcing or search. bad)
-
imer
and if you see an unknown fingerprint do lots of requests, and know its not a browser, you block it
-
immibis
you're saying all someone has to do is run the archiver scripts through Tor and reddit will block tor access
-
imer
does tor actually work?
-
nicolas17
well no, reddit would still see the origin client's TLS handshake
-
immibis
reddit allows you to read it through tor
-
immibis
but you're saying if someone read it a lot with the wrong fingerprint, they could be made to automatically ban all tor users
-
archivst
How long are these bans? Are they just minute/hour level throttling, or do they last longer?
-
JAA
→ #shreddit
-
thuban
(mangadex also limits its group pagination :/ max 10000 results, search enabled on group names only. they seem cool so we might be able to get a complete list (of sites/of blogspot sites) if we ask nicely, but i would have to get on... discord...)
-
h2ibot
Blankie edited Fandom (-1, /* Download */ Fix link to more information…):
wiki.archiveteam.org/?diff=51955&oldid=49560
-
h2ibot
IDKhowToEdit edited Deathwatch (+301, Add marketplace comment deprecation for roblox):
wiki.archiveteam.org/?diff=51956&oldid=51954
-
h2ibot
Dango360 edited Roblox (+7342, added roblox comments removal section):
wiki.archiveteam.org/?diff=51957&oldid=49854
-
h2ibot
IDKhowToEdit edited Roblox (+384, Added marketplace comment removal):
wiki.archiveteam.org/?diff=51958&oldid=51957
-
h2ibot
JustAnotherArchivist edited Roblox (-369, Remove duplicate content, datetimeify):
wiki.archiveteam.org/?diff=51959&oldid=51958
-
fireonlive
news.ycombinator.com/item?id=39852219 < is openai going to get mad about this and lock things down lol
-
JAA
TIL /raw/ on Discourse
-
JAA
> Raw data was gathered into a single JSONL file by automating a browser using Playwright.
-
JAA
Running a full browser to fetch some JSON...
-
immibis
is an effective way to bypass any check that is looking for non-approved browsers
-
JAA
Obviously, but as far as I can see, there isn't such a check here.
-
JAA
Or at least not one that would excessively limit the retrieval rate.
-
immibis
it's also an effective way to run all the arcane bloated SPA JS code to fetch the data for you
-
h2ibot
-
icedice
<thuban> icedice: i will look at scraping mangaupdates and mangadex for release group sites and feeding the blogspot urls to #frogger, thanks for the suggestion. i cannot access the vatoto groups page; does it require login?
-
icedice
Vatoto works for me
-
icedice
It has groups under letter categories
-
icedice
<thuban> (mangadex also limits its group pagination :/ max 10000 results, search enabled on group names only. they seem cool so we might be able to get a complete list (of sites/of blogspot sites) if we ask nicely, but i would have to get on... discord...)
-
icedice
I've chatted with MangaDex staff in the past
-
icedice
I can handle it if you want
-
thuban
icedice: that sounds good, thank you!
-
icedice
No problem
-
icedice
thuban> (nb: mangaupdates' group list seems to go only to page 100 and cut off in the middle of 'P'. i can use the 'by letter' pages to get _most_ of the rest, but their collation order seems to put non-ascii characters at the end so there's no way to find eg
mangaupdates.com/group/ago2peh/al-yans-kustarnikov except by brute-forcing or search. bad)
-
icedice
Mangaupdates has an IRC channel at #baka-updates⊙iin
-
icedice
They handed over Imgur links from their forums to me in the past
-
icedice
However, iirc they ignored me for probably like a week at least until I poked them again and they went "here's the list, now piss off"
-
icedice
Or something along those lines
-
icedice
I think that was them, at least
-
thuban
hmmm
-
thuban
i did use search to do some spot-checking with other cyrillic characters and didn't find any results other than that group, and with cjk and didn't find anything i hadn't already seen in 'all', so paging by letter is probably Good Enough™?
-
thuban
i'm much less confident in saying that about cjk/other character sets than about cyrillic, but
-
thuban
good place to start unless/until one of us talks to them about it
-
HP_Archivist
Happened upon
narkive.com - doesn't look like it's been crawled in length previously
-
jo70
how to use itunes content and how to search on specific topic
-
c3manu
can anyone tell me whether it's a good idea to grab a mailman instance using AB? the wiki page mentions a few tools, but doesn't say anything about AB
-
pokechu22
c3manu: pretty sure most of them have been done via AB?
-
c3manu
pokechu22: idk, that's why i'm asking ^^
-
thuban
c3manu: yes, people have been doing it with archivebot (
hackint.logs.kiska.pw/archiveteam-bs/20230616#c352608). from what i've heard mailman 2 and mailman 3 both work ok (
hackint.logs.kiska.pw/archiveteam-bs/20230621#c353873)
-
thuban
-
c3manu
thuban: oh nice, thanks. i indeed do have a 2.19 here
-
c3manu
eeh 2.1.29
-
h2ibot