03:39:13 #archiveteam topic may need updating (pointless to still mention taringa, dunno about the other two) 05:46:40 * arkiver is back from a few days of lower availability 05:50:17 * fireonlive waves to arkiver 05:50:21 welcome back! 05:50:25 thanks :) 05:50:28 :) 07:00:33 JAABot edited CurrentWarriorProject (-38): https://wiki.archiveteam.org/?diff=52051&oldid=52007 11:48:41 fyi: a german journalist apparently has a case against him for linking to the linkunten-indymedia archive. that's after offices have been raided and electronic devices been confiscated. only got a german language link for now: https://rote-hilfe.de/meldungen/unbequeme-berichterstattung-prozess-gegen-linken-journalisten 11:49:38 that's an english one from last year: https://cpj.org/2023/01/german-police-search-office-of-independent-broadcaster-and-2-journalists-homes-seize-equipment-and-documents/ 12:52:56 how is (2) https://www.gesetze-im-internet.de/stgb/__85.html actually interperted by the courts? 13:05:01 murb: not sure, the case hasn't been decided yet. it seems to be based §129 though. and it's really fishy in this whole matter. for making the website illegal, they declared linksunten to be a german "Verein", which is not at all what it was 13:05:14 assuming you can read german: https://www.tagesschau.de/inland/indymedia-verbot-101.html 13:05:42 i don't remember what the verdict on that was though, i would have to read up on that 13:15:38 Manu edited Mailman/2 (+147, http://jul.es/pipermail): https://wiki.archiveteam.org/?diff=52052&oldid=52050 13:23:39 Manu edited Mailman/2 (+0, http://dovecot.org/pipermail/): https://wiki.archiveteam.org/?diff=52053&oldid=52052 14:34:52 Manu edited Mailman/2 (-30, Running https://erlang.org/mailman): https://wiki.archiveteam.org/?diff=52054&oldid=52053 16:49:46 I'm getting rid of a bunch of old project channels today. You won't notice anything as they've been inaccessible since late 2022 already anyway. They're also marked accordingly on the wiki since then. 16:53:25 * fireonlive pours several out 17:26:18 has anyone thought of archiving help.openstreetmap.org 17:40:08 Yes, it was fully archived with ArchiveBot last month. 17:42:11 and coordinated with the openstreetmap admins 17:58:52 nice 19:39:48 thuban: For some reason I've decided to torture myself by manually getting every Google Site and Blogspot link from E-Hentai 19:40:03 * myself screams in anguish 19:40:17 Done with Google Sites, I'll do Blogspot as my mental state allows 19:40:38 o7 19:40:54 I'd assume most freely hosted hentai scanlation sites are on there 19:42:42 I also just got a good idea 19:43:05 Kemono has a ton of NSFW Google Sites and Blogspot links 19:43:07 https://kemono.su/posts?q=sites.google.com 19:43:15 https://kemono.su/posts?q=blogspot.com 19:43:41 It's basically a Patreon/etc. archiver 19:47:02 I wonder if one of these softwares saves the text that has the links: https://github.com/search?q=Kemono&type=repositories&s=stars&o=desc 19:47:36 Still, a bunch of separate txt files is a pain in the ass to deal with 19:48:07 So I guess a custom scape would be best 19:48:15 The site uses DDoS-Guard though 19:52:11 This could maybe be rewritten for Google Sites and Blogspot: https://github.com/SatyamSSJ10/Kemono-youtube-fetch 20:00:28 thuban: https://transfer.archivete.am/15bVDx/E-Hentai%20Google%20Sites.txt 20:00:29 inline (for browser viewing): https://transfer.archivete.am/inline/15bVDx/E-Hentai%20Google%20Sites.txt 20:00:56 I skipped the sites that were behind a Google login 20:01:08 Which was most of them 20:02:03 And for one group I included their other links in there while I was at it 20:02:19 Seems to mostly be artists using Google Sites 20:02:28 i'm really curious as to why I was highlighted for that 20:02:34 i have no idea how we handle google sites, actually--we had a project but i think it was just for the 'classic' sites. no idea whether it would work on current sites 20:03:05 So vanilla ArchiveBot wouldn't cut it? 20:03:42 Archivebot does work with google sites to my understanding 20:03:48 Nice 20:04:13 but you do have to start one archivebot job per site, which makes it not super useful for large quantities of sites that need to be saved quickly 20:04:13 thuban do you think you can scrape Kemono for Google Sites and Blogspot links? 20:04:27 ^ right, just not sure whether something else would be more apt 20:04:40 Ok 20:04:50 sorry, i'm rather busy at present 20:04:58 Well, it's just 16 Google Sites from E-Hentai 20:05:19 So it could probably be fed site by site 20:05:27 Ok, no worries 20:05:39 Hey, it's inporntant. We'll figure it out :D 20:06:35 Yeah 20:06:48 I'm not doing Kemono manually though lol 20:07:25 823 posts (17 pages) of Google Sites links 20:07:52 9612 posts (193 pages) of Blogspot links 20:08:56 blogspot we can just dump in #frogger, so that's fine 20:10:54 google sites we could _maybe_ do through ab with queueh2ibot, but it would make sense to find out whether #nearlylostmygoogles does/can apply first 20:11:48 16 sites is few enough to just do it manually. 20:12:09 yeah, but 823... 20:13:08 Oh, two different sources, right. 21:07:37 thuban it's 823 posts, not 823 sites 21:08:10 Most likely it's like 30 sites with a few of them being linked to in hundreds of posts each 21:08:48 Since some artists put their site link in every post 21:10:59 There's no way of seeing which artist made which post without opening the post though 21:11:17 Otherwise I could just speedrun through the search pages manually 21:11:44 Now if I want to do it manually I'd have to open every single post 21:12:24 Even if I could speedrun it the Blogspot ones are too much 21:17:50 oic, thought you were using that scraper you linked 23:17:29 -+rss- Show HN: A self-published art book about Google's first 25 years: This took me 3 years to finish. (It is 100% self-published, not endorsed by Google.)So… I wrote a book. It’s a different book with a unique approach. It’s not a novel or a technical book. It’s a biography, a company’s biography. My hope is that it serves two 23:17:30 purposes: to inspire founders and to captivate interior designers.It all [...] https://news.ycombinator.com/item?id=40067484 23:17:39 i hope this gets preserved somehow..