-
Jackster
Anyone got grab-site to successfully login to vbulletin forum? The cookies dont work for me.
-
pabs
so, uh, Iran...
-
anarcat
what about it
-
nicolas17
anarcat: iran vs israel attacks going on
-
anarcat
yes, well
-
anarcat
anything needs archiving?
-
kpcyrd
"where do you get your news from?" - "some irc channel for library enthusiasts"
-
pabs
archive both sides in case of escalation?
-
thuban
!remindme 3h scrape bbc media guides for urls-sources
-
eggdrop
[remind] ok, i'll remind you at 2024-04-14T05:42:14Z
-
fireonlive
kpcyrd: ikr? i find out about so many things here lol
-
eggdrop
[remind] thuban: scrape bbc media guides for urls-sources
-
icedice
thuban: I remembered another scanlation group link directory, a Discord server called Great Discord Links Hub (previously known as Scan Group Directory):
discord.gg/xAsyVb52a9
-
icedice
With Mangaupdates, MangaDex, Vatoto, and Great Discord Links Hub we should have pretty good coverage of scanlation group sites
-
icedice
I'll see if someone in #discard can scrape the links
-
tapos
You should scrape
e-hentai.org for scanlation sites as well, scanlators sometimes post their site in the comments section
-
tapos
If it's too much work to scrape for links, then you could scape it via Bing
-
tapos
Not as good, but better than nothing
-
tapos
Also, I think there'll probably be stuff on there that isn't covered by the link lists you've already scraped
-
Jackster
Nuked offline more like
-
tapos
Nevermind the Bing scape, it seems like Bing doesn't index E-Hentai
-
tapos
I'm guessing FAKKU got them to censor out the whole domain
-
tapos
That publisher tends to go nuclear
-
h2ibot
Manu edited Deathwatch (+294, add gimpscripts.net):
wiki.archiveteam.org/?diff=52046&oldid=52024
-
h2ibot
Manu edited Deathwatch (+38, add gimpscripts.net job reference):
wiki.archiveteam.org/?diff=52047&oldid=52046
-
Barto
woop woop irc is back after this bigass btrfs volume failure, still a lot of files to recover, but that that part was saved ;-)
-
icedice
thuban: Vokun is taking care of scraping that Discord server for links
-
icedice
Should we expand the scope of the scanlation group archivation project to included social media (other than Discord)?
-
icedice
Seems like a good idea
-
icedice
Their Discord servers shold probably be left alone since it's sort of an invasion of privacy to archive that and index it publicly
-
thuban
icedice: depends which social media; we don't have a good way of handling most of the major sites (facebook, twitter, instagram) right now.
-
thuban
fwiw, for the mangaupdates and vatoto scrapes i grabbed all links listed, and dumped relevant urls into appropriate projects (including telegram)
-
icedice
Ok, nice
-
icedice
Not sure what sites MangaDex lets you list, probably Twitter and Facebook, at least
-
icedice
Vokun got Tumblr, Facebook, Twitter, and Instagram links from the Discord scrape
-
icedice
The Tumblr ones are important since some groups use that for their websites
-
icedice
The rest we dump into the relevant projects, I guess?
-
thuban
mangadex's group schema only includes website, irc, discord, email, twitter, and mangaupdates links, so nothing useful for us there. (there are a bunch of groups with other social media, like telegram or vkontakte, but they get listed as 'website' so i've already covered them)
-
icedice
Ok
-
icedice
That thing said earlier about scraping E-Hentai for links might be a good idea
-
icedice
If any Google-hosted sites are going to get yeeted, it's those