03:24:13 JustAnotherArchivist edited Deathwatch (-4, /* 2023 */ Definitive deadline for OneHallyu): https://wiki.archiveteam.org/?diff=51385&oldid=51384 04:13:13 from google alerts: https://mastodon.social/@stroughtonsmith/111602502197304440 :p 04:57:26 I'm trying to qwarc OneHallyu. It's very slow. 05:23:14 AQNB reopened a few hours ago and will shut down tomorrow (21st). Running through AB now. 05:24:09 wooh :) 05:25:11 The Analysis & Policy Observatory (APO) managed to secure a partnership and isn't shutting down after all. Would be nice to archive anyway, but not as urgent. (Heavy rate limits and bans thwarted the AB attempts.) 05:25:39 > After taking a well-deserved break, APO will be re-established in the new year. The website will remain open for you to search and browse our policy and research repository and collections. 05:25:44 Whatever 're-established' means exactly. 05:25:53 https://apo.org.au/FAQ 05:26:48 hm I think something about this whole "teraleak" stuff should be mentioned in our TestFlight wiki page 05:27:18 in particular since I and others did some indexing of what is in the archived data 05:28:10 Yeah, that would probably be a good idea. It was just public data, as I understand it (this was before my time here). 05:28:49 it was just a regular project 05:28:53 like all other projects 05:29:01 i did see arkiver2 as the committer :3 05:29:37 It wasn't even a listable S3 bucket probably, right? Just S3 URLs referenced on the website? 05:29:47 I don't know how discovery worked 05:29:55 the tons of attention and "teraleak" branding is mostly people who just found out the "web archiving is pretty cool, because it saves stuff, like games, that are later maybe not available" 05:30:04 found out that* 05:30:13 remember for us it's pretty normal and obvious 05:30:32 for many non-tech people out there it's a first time they really see something like this 05:30:43 Yeah, but it might be worth clarifying that we're archiving public data, not hacking into private systems or whatever. 05:30:45 I think good old scraping of previous projects found URLs like http://testflightapp.com/install/000059cc47865bcec060d67ddb11d30b-MTM2NjI4Ng/ 05:30:53 JAA: yeah! 05:31:10 someone called it 'leak' initially as well, which caught on and wasn't helpful... 05:31:24 the code is completely public though, people can see there's no hack-y stuff in there. 05:31:30 yep :) 05:31:35 and from there you ended up at the ipa download link like https://testflightapp.com/dashboard/ipa/251aaeaaf0001ec906c157f2f31ddcbd-MTMxODQ0MzI/6280ca3ee10631fd6817100ffd1ee849-MTMzMzc3/ 05:31:39 fireonlive: yeah honestly sounds like Discord shouting 05:31:40 which redirected to cloudfront or s3 05:31:48 arkiver: indeed 05:32:22 i'm very happy though nicolas17 is making good use of this, and we're getting attention :) 05:32:24 arkiver: Jason Scott went into the Discord to clarify things 05:32:31 although in a different way would have been better 05:32:34 nicolas17: yeah 05:32:34 arkiver: Assuming people (a) take the time to look for the code, (b) read the code, and (c) understand the code. :-) 05:33:06 I know someone who plans to do some analysis on the executables 05:33:08 JAA: yeah but if any "official bodies" get involved or look further into this, they'll find the code and how it's all working 05:33:13 Right 05:33:39 just an Archive Team project like all others :) 05:33:42 9 years ago 05:33:54 also this month i'm 10 years with Archive Team! 05:34:03 \o/ 05:34:04 :D 05:34:08 she works at a company making a decompiler and I think other reverse engineering tools, so 70000 real-world iOS binaries is a goldmine of test cases 05:35:50 nicolas17: yeah :) 05:36:44 arkiver: also for people who want to do stuff on the whole dataset, they're like "wait what is this warc thing" 05:37:09 "you mean I don't have to do 70k requests to the slow web.archive.org hostname to download each individual file?" 05:37:34 'I can download over a terabyte at 5 kB/s instead? Amazing!' :-P 05:37:43 But yeah 05:38:00 some people who have the storage quickly found the torrents 05:38:05 Right 05:38:12 Are these torrents complete? 05:38:32 there are torrents? 05:38:36 of testflight 05:38:38 I don't have the storage, so I'm piping wget into an "extract what I need and throw it away" script :P 05:38:56 arkiver: We only set noarchivetorrent since a couple years ago, so I'd expect there to be torrents. 05:38:58 archive.org's autogenerated torrents work fine, the warcs are max 50GB 05:39:02 everyone should get a free 1PiB minimum :( 05:39:03 JAA: ah 05:39:12 as a human right! 05:39:21 although they have the usual problems of IA torrents 05:39:37 nicolas17: what are those problems? 05:39:48 textfiles edited the description on the items to clarify where they came from, and added a preview image 05:40:07 which re-generated the torrents 05:40:30 so now the old ones don't work anymore, or at least don't exchange data with people who got the new ones :P 05:40:45 i don't know if we needed that image 05:40:47 on all items 05:41:16 yeah that was questionable, but I think the xml file with the metadata (including description) is in the torrent too 05:41:17 I don't think we needed it on any item. 05:41:24 JAA: yeah 05:41:28 just the logo maybe on the collection 05:41:33 so image or not, editing the description would invalidate the torrent anyway 05:41:49 :( 05:42:12 Collections can have images, I think? That could've been the appropriate place. 05:42:36 * nicolas17 beds 05:42:39 And a link to the wiki page there would've been good, too. 05:42:50 collection description maybe? hm. 05:43:02 cu nicky 07:48:16 https://www.reddit.com/r/DataHoarder/comments/18mjqjd/the_master_tapes_for_all_of_reboot_have_been/ 07:48:33 master tapes for โ€œrebootโ€ have been found 08:14:50 Archiveteam wiki down? 08:15:12 indee 08:15:13 d 08:20:00 Is it because of the TestFlight "leak" driving traffic to the website? 08:25:13 unsure; another AT wiki went down too 08:32:43 is that the https://internetarchive.archiveteam.org/ ? 08:35:28 indee 08:35:33 ..d 08:35:51 i see 11:24:51 Times of Israel should probably get archived, yeah? it has strict cloudflare in front of it so i bet generic efforts didn't get it 11:31:07 Quick question for the mind hive. Playing around with the ia command line tool for the first time. How do I set the period when the files were uploaded/published to download? I am trying to pull all the cdx.gz files for a given period and then process them with the cdxsummary tool. The command I am using is "ia search 11:31:08 'collection:archiveteam_telegram' --itemlist | xargs -r -n 5 ia download --glob '*.cdx.gz'". I assumed there would be a switch such as --date but that does not seem to be the case. If there is a better way to do this please do share. Thanks in advance! 11:40:24 Maybe I could use identifiers or something? 11:41:42 Would also like a way to export all of the cdx.gz download links for that period as I can create a script to run them through the cdxsummary tool 11:42:59 Any example being: https://archive.org/download/archiveteam_telegram_20231220082241_fa0afc34/archiveteam_telegram_20231220082241_fa0afc34.cdx.gz 15:57:15 out of curiosity, does #down-the-tube require any special permissions to use the bot? 15:59:50 checking #down-the-tube james without mod or voice could do so 16:04:39 magmaus3: technically, no permissions needed 16:05:26 but if you're unsure if some video/channel fits in the archival scope as documented in the wiki, ask an op to approve it before submitting 16:05:53 datechnoman: `ia search 'collection:archiveteam_telegram addeddate:[2023-12-01 TO 2023-12-20]' ...` for items created on those days. There's also `publicdate` (exact details of how that's set are unclear to me), and `oai_updatedate` allows to find items that had their most recent changes in some time window. You can use `null` instead of a date to make it an open range search. 16:08:30 A wild sketchy cow appeared. 16:08:59 but you didn't catch it fast enough 16:09:46 Quick, catch him now! 16:09:50 :-) 16:40:16 My OneHallyu grab has been running for a while now. It looks like it might be tight. Their server is very slow; I'm getting 4.5 to 6 seconds average response time. Hitting it even harder is unlikely to help. 16:40:53 Current ETA is just under 5 days. They're shutting down on the 25th... 17:42:00 Is that outline of AT Wiki planned? 17:43:59 Outage, you mean? 17:44:08 503 error 17:44:48 Not planned and being worked on as mentioned in the #archiveteam topic. 18:21:30 Totally planned. 18:21:32 We never miss 18:22:49 It's a planned unplanned outage. 18:23:25 We're ArchiveTeam, we always work with the assumption everything dies and goes down 18:23:28 Nothing surprises us. 18:25:33 The wiki is moving to Fandom so you can enjoy McDonald's ads and random unrelated gameplay videos along side your archival information! 18:31:29 All the information from the wiki is now available on our Discord server 19:04:05 Wiki is back. Quick, someone make an 'ansaleak' Twitter account for Yahoo Answers! 20:06:48 So, while I'm here, any other issues I need to be aware of? 20:07:03 I haven't abandoned you kids, I just went out for a pack of cigarettes 20:07:21 That should be my title: Archive Team Co-Founder, Went Out For Pack of Cigarettes 20:08:55 missing, presumed smoked? 20:53:28 i like this new title 20:54:51 theres a discord server 20:54:58 ๐Ÿ‘€ 20:55:06 absolutely not 20:55:07 No 20:56:28 https://lounge.kuhaon.fun/folder/62d856ed4f653aee/3dvas5.gif 20:59:57 \ msg fire *phew* that was a close one - lurker almost found out about the secret AT Discord server. See you in vc! 21:00:05 oh shit 21:01:13 Discord Server? 21:01:32 *Makes backups of emojis from steam giveaway server and leaves* 21:01:34 Now I am ready 21:17:04 ๐Ÿ‘€ 21:17:13 damn it null 21:58:32 https://support.google.com/groups/answer/11036538?hl=en uff 22:13:41 Flashfire42 edited List of websites excluded from the Wayback Machine/Partial exclusions (+32): https://wiki.archiveteam.org/?diff=51386&oldid=51376 22:15:42 Flashfire42 edited List of websites excluded from the Wayback Machine/Partial exclusions (+31): https://wiki.archiveteam.org/?diff=51387&oldid=51386 22:29:45 Flashfire42 edited List of websites excluded from the Wayback Machine (+23): https://wiki.archiveteam.org/?diff=51388&oldid=51361 22:34:00 checking Safari Tech Preview links against WBM now... 22:36:13 95 versions to go, and I started to get rate limited by WBM 22:39:41 nicolas17: Are you checking for truncation, too, or are these things below 2 GiB anyway? 22:40:24 for Safari I found one truncated yes 22:40:30 and included it in my list 22:40:45 Ok, good :-) 22:41:13 the headers were weird too 22:41:16 HTTP/2 200 content-length: 1048576 x-archive-orig-x-crawler-content-length: 19521864 x-archive-orig-content-length: 1048576 22:41:52 Beautiful 22:42:32 JAA thank you very much for that information. Really appreciate it :) 22:42:45 https://web.archive.org/web/20211208042307/http://appldnld.apple.com/Safari3/061-4602.20080416.t5rGb/SafariSetup.exe (truncated) 22:42:54 https://web.archive.org/web/20231220013857/http://appldnld.apple.com/Safari3/061-4602.20080416.t5rGb/SafariSetup.exe (complete, via archivebot yesterday) 22:55:50 JustAnotherArchivist edited Deathwatch (+287, /* 2023 */ Add Inside Imaging): https://wiki.archiveteam.org/?diff=51389&oldid=51385 23:00:51 JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51390&oldid=51388 23:11:34 OneHallyu seems to be getting slower. I'm seeing an average response time of over 6 seconds now. ETA: not in time 23:13:43 other people archiving maybe? D: 23:16:37 Perhaps