01:19:29 i hope you saved photobucket images because i recently got an email stating that my account was deactivated 01:19:58 the wiki page still says "not saved yet" 01:25:10 the only thing i could find about it is this but it seems to be outdated: https://archive.org/details/photobucketgrabs 01:42:36 someone's leaving reviews on items in that collection lmao 01:42:56 2 stars - "A couple of personal photos, photos of second hand cars and a few comics sketches and computers getting built. " 03:29:33 The IA comments seem to attract the most unhinged people for some reason 07:13:05 https://wiki.archiveteam.org/?diff=50445&oldid=48231 and https://wiki.archiveteam.org/?diff=50446&oldid=50234 look like spam. 08:27:27 Entartet edited List of websites excluded from the Wayback Machine (+34, Added vintagebigblue.org.): https://wiki.archiveteam.org/?diff=50451&oldid=50315 09:01:28 um h2ibot you wanna get your ass over to some of the other channels 10:15:29 JAA: found a former opensource.com team member on HN: https://news.ycombinator.com/item?id=37040739 12:44:53 dev.getsol.us (phabricator) is migrating to github https://getsol.us/2023/08/07/state-of-solus-august-2023/ 12:45:13 I'm working on getting a full list of the git repos for SWH/Codearchiver 12:48:05 https://transfer.archivete.am/ZjsEy/dev.getsol.us-source-git-repositories.txt 13:07:57 test 13:08:51 [For French users] Hi! Is there something planned for saving Pages Perso from Orange? You may have seen the news on https://pages.perso.orange.fr/ :/ 13:11:41 Antonin: I'm not French, but yes, we know about it 13:11:50 Thanks for reporting anyway 13:13:39 Great, thanks! :D What's planned? Haven't found anything on the Wiki 13:17:16 As far as I know we haven't done much yet, it's still a month out and usually ISP hosting sites are small enough that we can safely get them within ~2 weeks before shutdown 13:17:36 Not sure what kind of grab it will be, depends on the site 13:18:42 Once it starts we will (try to) have discussion in #webroasting 13:20:43 Well, there's lot of websites, but okay we'll see :) Have you any Matrix room? I'm not on IRC and if I keep this tab open it will disappear with others 13:22:23 Our IRC network implements a Matrix bridge, I believe https://hackint.org/transport/matrix is the page about it 13:22:40 But I don't use it so if you have problems you might want to ask in #hackint 13:22:53 (Or wait til someone who does comes on) 13:24:23 :) 13:25:36 Bonjour from FR ! 13:26:01 Nice, it's `#archiveteam-bs:hackint.org` fyi. Thanks! 13:36:25 Np 14:00:47 I’ll start on the orange stuff in the next few days 14:04:19 Awesome! Would be happy to help 🙂 15:27:32 Bram Moolenaar the creator of Vim has died - https://www.theregister.com/2023/08/07/bram_moolenaar_obituary 15:29:32 Requiescat in pace 16:48:10 I'll also look into doing an archivebot job for orange. I can't tell if https://pages.perso.orange.fr provides a list of all the sites directly or not; it seems to list some things but I don't know French 17:08:42 "I'll also look into doing an..." <- I do, if you'd like some help 17:09:18 Sure, a list like that would be useful no matter how the project is done 17:14:30 September 5 2023 is when it closes 17:14:46 9th January 2024 is when all access is revoked 17:15:28 Corrections welcome, this is the same Orange that used an Aphex Twin song in their commercials, isn't it? 17:15:37 To Cure a Weakling Child (Contour Regard) specifically 17:28:28 pabs, pokechu22: For simple things that don't need custom stuff, just some hardcoded cookie(s), I usually just use wpull with the --load-cookies option. Of course it's also possible with qwarc. 17:28:56 pabs: Also, uh, I think I forgor about opensource.com, but fortunately it's still alive. Will grab soon. 17:29:08 nicolas17: Yeah, it went down a while ago I believe. 17:29:59 It might need special checking to make sure that it's not on the challenge page, but I can try with wpull to see if that's sufficient still 17:30:46 (I had it randomly give challenges on some of the images when loading the page normally, which means the site didn't work right; it would be better to capture the site without those) 17:31:21 Yeah, that can be checked with one of the hooks. 17:31:44 Don't think you can control the writing to WARC though. qwarc could do that. 18:08:17 wget-at can do that, I know that much 18:38:17 Not being shut down, but worth archiving imo: https://jobim.org/ 18:39:03 Looks like we previously ran that in 2021: https://archive.fart.website/archivebot/viewer/job/202110212149266f4uk 18:50:38 What could help, is this directory : https://annuaire-pp.orange.fr/ 18:51:16 Not all websites are listed there, but... It's a beginning. 20:54:55 AntoninDelFabbro|m um either I am getting some weird errors or there is already something wrong with the perso.orange.fr stuff 20:56:15 Did a random sampling on the first page bing and none of them resolve for me but the one you linked there above does 20:58:12 Ok our info on the orange site is out of date they are hosted on monsite-orange.fr ranger than pagesperso-orange.fr 20:59:45 there is still some stuff in the cache for pageperso-orange.fr tho so I will do grabs for those and then move on to monsite-orange.fr 20:59:48 There's both (tho the first one is "more recent" I think) 21:00:52 I mean I will check both but just from my sampling the pagesperso ones didnt resolve for me. so I can do jobs for the cache stuff and whatever resolves on pagesperso and then move to monsite 21:01:49 pagesperso-orange.fr/ 21:01:49 monsite-orange.fr/ 21:01:51 Alright 😁 21:02:25 https://server8.kiska.pw/uploads/3617fef2e3bc1038/image.png 21:03:02 (someone uses microsoft edge? :P) 21:03:43 Fuck you microsoft will give me free windows 11 pro soon for the rewards. Plus I am too lazy to switch all my stuff over. I may have to if they start doing that screenshot bullshit or whatever tho 21:04:06 proaction.pagespro-orange.fr works for me, and perso.orange.fr/stephane.busson redirects to http://stephane.busson.perso.orange.fr/ 21:04:26 Yeah, what doesn't resolve for you? 21:04:46 http://pagesperso-orange.fr/ redirects to https://e.orange.fr/error404.html for me 21:05:13 Yeah ok those resolve so I will have to check each of them individually it seems 21:05:24 Can you give an example of one that doesn't work? 21:05:47 https://chabrieres.pagesperso-orange.fr/texts/clockwork_orange.html 21:06:10 it appears fine in the cache just doesnt otherwise resolve for me 21:06:14 Maybe I am blocked? 21:06:16 Yeah that one works fine for me 21:06:23 flashfire42: just poking a bit of fun :D 21:06:24 so I guess they just hate australia? 21:06:37 I have been running skyblogs that is also french maybe they decided FUCK YOU 21:06:56 So I may not be able to assist in full scraping at this time until skyblogs is done? 21:08:14 This is annoyingly one where !a < list doesn't work, since it's a bunch of different domains 21:13:29 "https://server8.kiska.pw/uploads..." <- And I have a 404 here 😆 No, those doesn't work, they do only with subdomains 21:13:30 "(someone uses microsoft edge? :P..." <- I have it I czn watch sth for you tomorrow 21:14:07 ooh what are you going to watch for me 21:15:04 Idk, why have you asked "someone uses Edge ? :P" ? 😆 21:15:23 Oh nevermind, gotcha 23:28:23 bash.org still ded