03:39:13 <nicolas17> #archiveteam topic may need updating (pointless to still mention taringa, dunno about the other two)
05:46:40 * arkiver is back from a few days of lower availability
05:50:17 * fireonlive waves to arkiver 
05:50:21 <fireonlive> welcome back!
05:50:25 <arkiver> thanks :)
05:50:28 <fireonlive> :)
07:00:33 <h2ibot> JAABot edited CurrentWarriorProject (-38): https://wiki.archiveteam.org/?diff=52051&oldid=52007
11:48:41 <c3manu> fyi: a german journalist apparently has a case against him for linking to the linkunten-indymedia archive. that's after offices have been raided and electronic devices been confiscated. only got a german language link for now: https://rote-hilfe.de/meldungen/unbequeme-berichterstattung-prozess-gegen-linken-journalisten
11:49:38 <c3manu> that's an english one from last year: https://cpj.org/2023/01/german-police-search-office-of-independent-broadcaster-and-2-journalists-homes-seize-equipment-and-documents/
12:52:56 <murb> how is (2) https://www.gesetze-im-internet.de/stgb/__85.html actually interperted by the courts?
13:05:01 <c3manu> murb: not sure, the case hasn't been decided yet. it seems to be based §129 though. and it's really fishy in this whole matter. for making the website illegal, they declared linksunten to be a german "Verein", which is not at all what it was
13:05:14 <c3manu> assuming you can read german: https://www.tagesschau.de/inland/indymedia-verbot-101.html
13:05:42 <c3manu> i don't remember what the verdict on that was though, i would have to read up on that
13:15:38 <h2ibot> Manu edited Mailman/2 (+147, http://jul.es/pipermail): https://wiki.archiveteam.org/?diff=52052&oldid=52050
13:23:39 <h2ibot> Manu edited Mailman/2 (+0, http://dovecot.org/pipermail/): https://wiki.archiveteam.org/?diff=52053&oldid=52052
14:34:52 <h2ibot> Manu edited Mailman/2 (-30, Running https://erlang.org/mailman): https://wiki.archiveteam.org/?diff=52054&oldid=52053
16:49:46 <JAA> I'm getting rid of a bunch of old project channels today. You won't notice anything as they've been inaccessible since late 2022 already anyway. They're also marked accordingly on the wiki since then.
16:53:25 * fireonlive pours several out 
17:26:18 <deadorbit> has anyone thought of archiving help.openstreetmap.org
17:40:08 <JAA> Yes, it was fully archived with ArchiveBot last month.
17:42:11 <nicolas17> and coordinated with the openstreetmap admins
17:58:52 <deadorbit> nice
19:39:48 <tapos> thuban: For some reason I've decided to torture myself by manually getting every Google Site and Blogspot link from E-Hentai
19:40:03 * myself screams in anguish
19:40:17 <tapos> Done with Google Sites, I'll do Blogspot as my mental state allows
19:40:38 <thuban> o7
19:40:54 <tapos> I'd assume most freely hosted hentai scanlation sites are on there
19:42:42 <tapos> I also just got a good idea
19:43:05 <tapos> Kemono has a ton of NSFW Google Sites and Blogspot links
19:43:07 <tapos> https://kemono.su/posts?q=sites.google.com
19:43:15 <tapos> https://kemono.su/posts?q=blogspot.com
19:43:41 <tapos> It's basically a Patreon/etc. archiver
19:47:02 <tapos> I wonder if one of these softwares saves the text that has the links: https://github.com/search?q=Kemono&type=repositories&s=stars&o=desc
19:47:36 <tapos> Still, a bunch of separate txt files is a pain in the ass to deal with
19:48:07 <tapos> So I guess a custom scape would be best
19:48:15 <tapos> The site uses DDoS-Guard though
19:52:11 <tapos> This could maybe be rewritten for Google Sites and Blogspot: https://github.com/SatyamSSJ10/Kemono-youtube-fetch
20:00:28 <tapos> thuban: https://transfer.archivete.am/15bVDx/E-Hentai%20Google%20Sites.txt
20:00:29 <eggdrop> inline (for browser viewing): https://transfer.archivete.am/inline/15bVDx/E-Hentai%20Google%20Sites.txt
20:00:56 <tapos> I skipped the sites that were behind a Google login
20:01:08 <tapos> Which was most of them
20:02:03 <tapos> And for one group I included their other links in there while I was at it
20:02:19 <tapos> Seems to mostly be artists using Google Sites
20:02:28 <nyany> i'm really curious as to why I was highlighted for that
20:02:34 <thuban> i have no idea how we handle google sites, actually--we had a project but i think it was just for the 'classic' sites. no idea whether it would work on current sites
20:03:05 <tapos> So vanilla ArchiveBot wouldn't cut it?
20:03:42 <pokechu22> Archivebot does work with google sites to my understanding
20:03:48 <tapos> Nice
20:04:13 <pokechu22> but you do have to start one archivebot job per site, which makes it not super useful for large quantities of sites that need to be saved quickly
20:04:13 <tapos> thuban do you think you can scrape Kemono for Google Sites and Blogspot links?
20:04:27 <thuban> ^ right, just not sure whether something else would be more apt
20:04:40 <tapos> Ok
20:04:50 <thuban> sorry, i'm rather busy at present
20:04:58 <tapos> Well, it's just 16 Google Sites from E-Hentai
20:05:19 <tapos> So it could probably be fed site by site
20:05:27 <tapos> Ok, no worries
20:05:39 <nyany> Hey, it's inporntant. We'll figure it out :D
20:06:35 <tapos> Yeah
20:06:48 <tapos> I'm not doing Kemono manually though lol
20:07:25 <tapos> 823 posts (17 pages) of Google Sites links
20:07:52 <tapos> 9612 posts (193 pages) of Blogspot links
20:08:56 <thuban> blogspot we can just dump in #frogger, so that's fine
20:10:54 <thuban> google sites we could _maybe_ do through ab with queueh2ibot, but it would make sense to find out whether #nearlylostmygoogles does/can apply first
20:11:48 <JAA> 16 sites is few enough to just do it manually.
20:12:09 <thuban> yeah, but 823...
20:13:08 <JAA> Oh, two different sources, right.
21:07:37 <tapos> thuban it's 823 posts, not 823 sites
21:08:10 <tapos> Most likely it's like 30 sites with a few of them being linked to in hundreds of posts each
21:08:48 <tapos> Since some artists put their site link in every post
21:10:59 <tapos> There's no way of seeing which artist made which post without opening the post though
21:11:17 <tapos> Otherwise I could just speedrun through the search pages manually
21:11:44 <tapos> Now if I want to do it manually I'd have to open every single post
21:12:24 <tapos> Even if I could speedrun it the Blogspot ones are too much
21:17:50 <thuban> oic, thought you were using that scraper you linked
23:17:29 <fireonlive> -+rss- Show HN: A self-published art book about Google's first 25 years: This took me 3 years to finish. (It is 100% self-published, not endorsed by Google.)So… I wrote a book. It’s a different book with a unique approach. It’s not a novel or a technical book. It’s a biography, a company’s biography. My hope is that it serves two
23:17:30 <fireonlive> purposes: to inspire founders and to captivate interior designers.It all [...] https://news.ycombinator.com/item?id=40067484
23:17:39 <fireonlive> i hope this gets preserved somehow..