00:51:21 Anyone got grab-site to successfully login to vbulletin forum? The cookies dont work for me. 01:29:08 so, uh, Iran... 01:37:26 what about it 01:38:35 anarcat: iran vs israel attacks going on 01:39:41 yes, well 01:39:56 anything needs archiving? 01:43:30 "where do you get your news from?" - "some irc channel for library enthusiasts" 01:53:29 archive both sides in case of escalation? 02:42:14 !remindme 3h scrape bbc media guides for urls-sources 02:42:15 -eggdrop- [remind] ok, i'll remind you at 2024-04-14T05:42:14Z 02:50:44 kpcyrd: ikr? i find out about so many things here lol 05:42:15 [remind] thuban: scrape bbc media guides for urls-sources 13:10:49 thuban: I remembered another scanlation group link directory, a Discord server called Great Discord Links Hub (previously known as Scan Group Directory): https://discord.gg/xAsyVb52a9 13:11:46 With Mangaupdates, MangaDex, Vatoto, and Great Discord Links Hub we should have pretty good coverage of scanlation group sites 13:12:04 I'll see if someone in #discard can scrape the links 15:00:32 You should scrape https://e-hentai.org/ for scanlation sites as well, scanlators sometimes post their site in the comments section 15:01:07 If it's too much work to scrape for links, then you could scape it via Bing 15:01:14 Not as good, but better than nothing 15:03:39 Also, I think there'll probably be stuff on there that isn't covered by the link lists you've already scraped 15:30:15 Nuked offline more like 15:40:45 Nevermind the Bing scape, it seems like Bing doesn't index E-Hentai 15:40:58 I'm guessing FAKKU got them to censor out the whole domain 15:41:07 That publisher tends to go nuclear 19:15:13 Manu edited Deathwatch (+294, add gimpscripts.net): https://wiki.archiveteam.org/?diff=52046&oldid=52024 19:43:19 Manu edited Deathwatch (+38, add gimpscripts.net job reference): https://wiki.archiveteam.org/?diff=52047&oldid=52046 21:24:23 woop woop irc is back after this bigass btrfs volume failure, still a lot of files to recover, but that that part was saved ;-) 22:53:30 thuban: Vokun is taking care of scraping that Discord server for links 23:41:46 Should we expand the scope of the scanlation group archivation project to included social media (other than Discord)? 23:41:53 Seems like a good idea 23:43:26 Their Discord servers shold probably be left alone since it's sort of an invasion of privacy to archive that and index it publicly 23:45:23 icedice: depends which social media; we don't have a good way of handling most of the major sites (facebook, twitter, instagram) right now. 23:47:16 fwiw, for the mangaupdates and vatoto scrapes i grabbed all links listed, and dumped relevant urls into appropriate projects (including telegram) 23:49:34 Ok, nice 23:50:00 Not sure what sites MangaDex lets you list, probably Twitter and Facebook, at least 23:51:01 Vokun got Tumblr, Facebook, Twitter, and Instagram links from the Discord scrape 23:51:21 The Tumblr ones are important since some groups use that for their websites 23:51:47 The rest we dump into the relevant projects, I guess? 23:55:56 mangadex's group schema only includes website, irc, discord, email, twitter, and mangaupdates links, so nothing useful for us there. (there are a bunch of groups with other social media, like telegram or vkontakte, but they get listed as 'website' so i've already covered them) 23:58:01 Ok 23:58:50 That thing said earlier about scraping E-Hentai for links might be a good idea 23:59:09 If any Google-hosted sites are going to get yeeted, it's those