00:00:09 JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=50315&oldid=50314 01:45:53 so do i send the url here or #archiveteam ? 01:46:01 here 01:46:27 ok whats the channel commands 01:47:07 just post what you want archived and the reason for archiving here and an admin will take care of it 01:47:22 sweet ok 01:47:37 1. https://go-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io/ 01:47:44 2. http://dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io/ 01:47:56 They both are under a google storage bucket 01:48:08 and their was another but that one closed a few months ago 01:48:15 and im pretty worried for those 01:48:26 https://go-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io just redirects to disney.com if I ignore a certificate error 01:48:35 is there more to it? 01:48:42 pokechu22 use gsutil 01:49:31 Ah, so it's possible to generate a lit of valid URLs for each bucket? I've only seen similar stuff with amazon AWS where you get an XML file when loading it, but I guess this is different 01:50:11 dolimg returns an HTTP 200 with 'Oops! We're sorry, but we're having technical problems.' for me. 01:50:26 Yeah, same for me 01:50:44 I'm taking a look at https://cloud.google.com/storage/docs/gsutil - first time I've heard of this 01:50:49 they both work when using gsutil 01:51:05 Doesn't that require auth? 01:51:10 nope 01:51:20 im pretty sure its not configured 01:51:54 ok, `apt install gsutil` gives me GrandStream BudgeTone phone backup, restore and reboot utility - http://www.pkts.ca/gsutil.shtml which is clearly not the right thing 01:52:08 It's in PyPI it seems. 01:52:54 Although the official docs don't seem to mention this, or at least not prominently if it is somewhere. They instead try to make you install an entire set of CLIs. 01:54:06 https://cloud.google.com/storage/docs/gsutil_install 01:54:35 theirs the install instructions it seems to be lise aws cli but for google storage buckets 01:54:38 Wow, gsutil's code is disgusting. In the first few lines I'm reading, I'm already seeing `importlib` and `sys.path` madness. 01:54:55 ew 01:56:13 Huh, and `gsutil ls gs://dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io/` lists stuff in the root directory (though it doesn't recurse over it) 01:57:14 it should be -r 01:57:20 and -h 01:57:33 and also -l if im not wrong 01:57:46 Ah, yep 01:58:58 yeah 01:59:21 both cdns combinded is 340+ gb of data 02:00:06 That should be doable for archivebot as long as we pick a pipeline with enough space 02:01:17 cool 02:02:14 I'm currently doing a listing for gs://dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io - thanks for bringing this up 02:02:35 :) 02:02:48 I am curious where gsutil is getting the data from - it'd be nice to also save all of those metadata URLs 02:03:42 🤷 02:04:04 The core seems to be a 2000-line wrapper around boto. 02:04:17 relly 02:04:29 https://github.com/GoogleCloudPlatform/gsutil/blob/667663786c06d6e5849a467645e1846c908046bf/gslib/boto_translation.py 02:04:55 whats the job id for the amazon buckets of jumpstart games? 02:05:50 That didn't run through ArchiveBot because it required deduplication. 02:06:31 meaning? 02:06:32 Interesting, it seems like most of the files on dolimg are dated to 2018? 02:06:47 Meaning there is no job ID. 02:07:11 yeah my group of friends who discovered it assumed when ever they remove or add a file all file dates gets modifed 02:07:32 either that or its a google cloud update 02:08:04 e.g. I see 129304 2018-02-16T20:20:01Z gs://dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io/en-US/blogs/wp-content/uploads/2013/03/thomas_pocahontas_imaginary_boyfriends.jpg - that file is clearly from 2013 but is dated 2018 still 02:08:36 Perhaps they migrated to GCP in 2018? 02:08:38 its either google or disney that did that 02:08:50 I agree JAA' 02:09:25 i do know theirs a file in the cdn that should not be archive it includes ftp logins 02:09:32 and two ftps do indeed work 02:09:41 and i dont want anyone messing with the ftps 02:10:49 TOTAL: 1508969 objects, 280062417599 bytes (260.83 GiB) for dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io 02:11:18 Are the URLs used by this CDN also used on other sites? Like, would a site embed dolimg-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io/en-US/blogs/wp-content/uploads/2013/03/thomas_pocahontas_imaginary_boyfriends.jpg directly? Or is it a duplicate of a file on a different site? 02:11:19 that was quick damn 02:11:22 Not that that's really a problem 02:11:36 To be clear: that's the time to list files, not download them :) 02:11:51 i honestly have no clue 02:12:08 i also really enjoy a look thru into /worldofcars/ within dolimg 02:12:13 and i did archived it 02:12:26 https://archive.org/details/world-of-cars-and-cars-site 02:17:27 TOTAL: 509858 objects, 48022996845 bytes (44.72 GiB) for go-60de6c82-be11-98e1-4d6c-c65a234eee95.disney.io. I'm going to run that as http instead of https since they both seem to be the same content, but https has a bad cert 02:17:42 cool 02:17:52 ok 02:33:35 An ArchiveBot job for Yandex.Q has been running since late June, but the site is very obviously too large for AB. There are pretty strict rate limits which trigger captcha page redirects. 02:33:56 Probably needs the DPoS treatment to get anywhere near decent coverage. 02:34:31 Worth noting that archivebot also extracts TONS of junk URLs on Yandex, which are ignored on that job but don't help with archivebot's performance 02:35:44 Hard to say how quickly we need to move here. I'd assume it's not extremely urgent, but whenever there's room for getting a project going, this would be good to do I think. 02:35:47 Cc arkiver OrIdow6 ^ 02:36:59 So I presume we arent doing trove? I would like to push a project for that too considering how large an archive it is 02:46:38 JustAnotherArchivist edited Deathwatch (+178, /* 2023 */ Add Яндекс Кью aka Yandex.Q): https://wiki.archiveteam.org/?diff=50316&oldid=50310 02:51:07 Agreed on Trove! 04:50:20 On Yandex, I could do it, though Egloos + Wysp and then Google Drive come first 13:03:08 It's not the most important site on the web by a long-shot but any chance we can get an archivebot job on https://www.koopatv.org/ ? 13:04:03 It's a parody blog based on mario characters running since 2013 that recently closed (marked as archived) 13:14:37 themadpro: i'll see what i can do, thanks 14:06:25 Hey; could use some info on how do I properly report website shutdown announcements over on the other room 14:09:06 qyxojzh: you can just post about it, just try to keep it compact with all the important details put together 14:13:27 Sorry for editing so much, I kept forgetting info 14:24:19 wait, did you post in the other room? I'm not seeing it 14:24:20 I maintain that the "no more than 5 words per year in #at!!!" often goes too hard, especially to newcomers 14:24:36 Pretty sure I did 14:24:47 OrIdow6: Pardon? 14:24:51 We should not have people thinking that this is a formal affair 14:25:17 (I must say I'm new, sorry) 14:25:20 qyxojzh|m: Not talking about you directly 14:25:41 Saying that we should stop telling people not to post too much in #archiveteam 14:25:41 I guess that's the continuation of a previous convo? 14:25:44 No 14:25:56 Oh now I see what you mean 14:26:07 Didn't connect the dots earlier… XD 14:26:25 :) 14:26:52 Anyhow yeah, it's not loading for me, could be something based on my location/location on the Internet maybe 14:27:10 OrIdow6: Sounds about right, because _I_ can access it 14:27:23 would a VPN help? 14:27:35 If you have sources on its being shut down qyxojzh|m those would be appreciated 14:27:57 OrIdow6: Yeah good idea, excuse me 14:27:59 There are probably other people for whom it loads correctly, I don't want to bother with that 14:30:12 OrIdow6: Turns out it was fake; it's not being shut down at all… Although it would still be a good idea to back the thing up, who knows 14:30:51 One of the teachers at the uni I am enrolled in shared the “news” with me and with others in the group I am in 14:32:48 On the topic of being a good thing to back up anyway, we were discussing backing up the online version of the national library of Australia a few days ago, so yeah, definitely 14:32:59 qyxojzh|m: Can I keep the link there even if it's not being shut down? 14:34:21 In #archiveteam? Yeah it's fine - FWIW I'm (as well as many other people here) using IRC, not Matrix, so I don't even see deletions 15:05:49 yay console.hetzner.cloud/ is down 15:06:58 or it is me 15:09:22 nvm 16:35:17 Bzc6p edited News+C/hu (-192, HVG 2021 done; update on video archiving): https://wiki.archiveteam.org/?diff=50317&oldid=49781 16:57:56 What do you think; in the near future, do you have a plan to preserve MEGA, if it happens to be shut down? 18:09:42 VickoSaviour: I don’t think much has been done in that regard but I could be weong 18:09:54 Why, is it shutting down soon? 18:10:04 Wrong* 18:21:56 well, i don't think it has enough proofs to be considered endangered, so it is safe. 19:53:00 "What do you think; in the near..." <- its also a PITA to archive, javascript fuckery at its finest that doesnt play back in the WBM 21:02:16 appears google is going to delete accounts unused for 2 years. this won't affect much as basically anything will reset the timer, and having done anything on youtube will mean your account will never be garbage collected, so the only thing this should affect are google drive links. 21:03:05 tbf that ship already sailed when old shares got made private-only, though. 21:43:03 "appears google is going to..." <- Last I checked it's for after the death of a Google user 21:54:57 Nope: https://www.blog.google/technology/safety-security/updating-our-inactive-account-policies/ 21:55:32 Starting no sooner than December, they'll purge accounts that have been inactive for 2 years because security. 21:56:07 Not surprised tbh 21:56:32 "security" 21:57:14 For their financial security, certainly 21:57:56 https://www.karayou.com/ is still up tho the spam is starting to creep in lol 21:58:44 'security' 21:58:57 google has gotten very 'holy shit how much space are we using?' lately it seems 21:59:01 flashfire42: what's that? 21:59:09 and has turned up the purge dials 21:59:11 SeCuRitY 21:59:22 :D 21:59:29 fireonlive: wonder if it's anything to do with that fire in Paris 21:59:30 karayou was supposed to shut down... like january 1st or something? 21:59:34 hmmmm 21:59:39 "wait, how many HDDs did we have to replace??" 21:59:40 Its a site mentioned on the deathwatch on the archiveteam wiki qyxojzh|m which was supposed to shut down at the start of this year but obviously someone forgot to flip a switch and its now starting to get spammers 21:59:44 lolol 21:59:56 flashfire42: Deathwatch, what a name 22:00:05 Yeah, 2023年 (year) 1月 (1 month = January) 1日 (1 day = 1st) 22:00:32 So it's a Japanese website? 22:01:08 https://wiki.archiveteam.org/index.php/Karayou.com 22:01:14 Also looks like Kaotic found a buyer the video saying its going down is gone so 22:01:29 Chinese, but same characters 22:01:47 also has a fairly large porn section, though not all of it is porn, just keep that in mind if you browse it 22:02:07 especially if you can't read the subforum names :P 22:02:13 pokechu22: I _just_ might be able to help a little then 22:02:25 Where? So I can avoid it? Asking for a friend (it's fireonlive). 22:02:38 We did save the entirety of it before January 1st 2023 22:02:52 It's the last group of subforums if I recall correctly 22:03:05 qyxojzh|m: because 我明白一点儿中文 22:03:15 That section has an '18' in the name, so sounds about right. 22:03:16 sorry if that's a brag XD 22:03:17 成人專區(未滿18請勿進入) - the 18 specifically 22:03:23 x3 22:03:39 Anyway, yeah, I archived all threads in late December. 22:03:41 Yes, “do not come in if you're under eighteen” at the end 22:03:52 Kudos to JAA 22:04:11 🔞 22:05:00 https://www.uktrainsim.com/ closes in a month and downloads are behind a login wall. I just put the site into archivebot but I dont think it will grab the downloads in retrospect 22:28:02 https://server8.kiska.pw/uploads/27737bab092284a5/image.png Time to grab Modworkshop.net?