00:12:35 https://linuxcontainers.org/lxd/ 00:13:11 “The LXD project is no longer part of the LinuxContainers project but can now be found directly on Canonical's websites” 00:22:58 04:47:56 PM a new canonical location for LXD: canonical 00:22:59 :D 00:23:11 had to share here as well :p 00:26:18 Looks like the GitHub repo and site have already changed and the forums aren't going anywhere. 00:26:24 So probably nothing to archive here. 00:27:43 oh good 00:27:58 See #6 in https://discuss.linuxcontainers.org/t/lxd-is-no-longer-part-of-the-linux-containers-project/17593 re forum contents. 00:50:10 Hopefully Canonical doesn't try any shenanigans. We don't need another Red Hat lol 00:55:55 i can totally see that happening at some point 00:56:08 it's why we should all use the one true os, Debian™ 01:38:07 Responding to replies: There is an AppleInternal item on the internet archive that has been existing since 2021-05 that’s named AppleInternal on there. MEGA does not log IP adresses according to the privacy policy. 01:40:15 the "demo apps" in the last mega link are the those apps you see on an Apple Store iOS device, and they definitely have modifications of some sort to them. 01:40:30 yeah there's games restricted to initial levels only etc 01:41:32 Yes I knew there were most likely only demo games for example. Apple does not have free trial app support on the App Store officially 01:42:18 JAA: can you easily check the contents of your KA WARCs, to see if a file is present in them? or do you have a file listing that matches your archive? 01:42:51 MEGA does not trace down individuals MEGA personal folder, going as far as not even opting in for PhotoDNA stuff. 01:44:00 JAA: "DWADragonsUnity/Android/3.31.0" seems new 01:44:34 oh is hte bucket still around? 01:45:00 yes 01:45:04 but it's gonna take me like an hour to list it all and tell you if anything got deleted 01:45:43 nicolas17: Not new, I downloaded that on 2023-06-14. 01:45:57 eg. DWADragonsUnity/Android/3.31.0/Mid/sound/dlgalphadeltathankyou 01:46:45 Grepping a 20 GB log file stored on HDD is fun. :-) 01:46:49 :D 01:47:27 "curl -I https://s3.amazonaws.com/origin.ka.cdn/DWADragonsUnity/Android/3.31.0/Mid/sound/dlgalphadeltathankyou" says "Last-Modified: Fri, 30 Jun 2023 13:43:44 GMT" 01:48:22 Yeah, that file is new. 01:50:11 there's like 36 new files in the 3.31.0 directory 01:50:30 Mhm 01:51:00 When were the most recent changes? 01:53:34 DWADragonsUnity/Steam/3.31.0 has similar changes 01:56:45 JAA: https://paste.debian.net/1284958/ (still incomplete) 02:00:29 I should have those 2023-06-28 changes already. Looks like I relisted the bucket at 17:00 that day. 02:04:21 Interesting that they just added a couple files though. 02:04:36 I'll rerun my stuff, but that'll have to wait until tomorrow. 02:06:34 same deal on WIN as I expected 02:07:43 how well did the deduplication work btw? 02:08:30 2023-06-28 17:01:28 UTC <@JAA> 32.23 TiB downloaded into 3.22 TiB of WARCs :-) 02:08:44 excellent 02:10:23 *finger tenting* 02:51:16 JAA: https://paste.debian.net/1284960/ 02:53:02 nicolas17: Thanks, so ~50 files to grab I guess. Will do. 03:00:24 thanks to JAA too! 05:06:38 https://business.twitter.com/en/blog/update-on-twitters-limited-usage.html 05:08:26 Are they trying to build anti-scraping and anti-automation technology? (Might help to have employees for that.) 05:11:06 >At times, even for a brief moment, you must slow down to speed up. 05:14:14 sounds like an empty platitude musk would come up with 06:19:20 Unauthed Tweet reads are available again without replies. Profiles still require login. 06:21:56 Sometimes you need to kill a few dozen monkeys to maybe not kill some humans -Musk 06:25:48 :O 06:27:41 can confirm, unauth'd is available :o 07:01:02 Maybe they figured out that their search visibility was tanking. https://www.theverge.com/2023/7/3/23783153/google-twitter-tweets-changes-rate-limits 07:10:38 SPN works if you give it a tweet address. But yeah, it's only showing a single tweet unauthed. Even if the tweet is itself a reply, it doesn't show the tweet it's replying to. Hard to follow a thread that way. At least quote tweets show up OK. 10:43:36 imer: Did you get all that Apple software downloaded or do you want my friend to send you any of it? 10:43:44 Or can he delete it? 11:42:35 icedice: got all of it as far as I know, handed it of to nicolas17 for uploading since he knows way more about apple archival stuff than me 11:45:37 Ok, I'll tell my friend to delete the local copy 11:45:47 He uploaded it encrypted to Google Drive as well 11:46:21 nice, thanks :) 11:52:30 PaulWise edited Mailman2 (+41, strace-devel mailing list): https://wiki.archiveteam.org/?diff=50123&oldid=50070 14:17:37 So the Pokémon Community Forums are getting moderated now. Guess we got the only copy of a lot of those now-deleted topics. 14:22:28 pour one out for the forklift operators 15:36:33 weird how we often get stuff just at the right time..... (one of my crawls is the only surviving copy of a majority of a forum, too) 15:53:38 nice! 15:54:21 “who up rn pokin their mon” will live on in infamy :3 16:03:06 Let's see if it goes on to rival 'How is babby formed?'. :-) 16:17:42 Has anyone written or scraped cohost.org yet? 16:17:53 https://hackers.town/@lori/110662357504916417 I just saw this reply in a long thread about their finances, it sounds a bit dire for cohost 16:18:06 tumblr is easy to archive but I have no idea where to start with cohost. 16:21:12 https://github.com/valknight/Cohost.py 16:21:30 seems like a good starting point to save individual "projects" but if that's a user page im not sure 16:36:04 JAA: love that one :) 16:53:14 ok ive got something for cohost. it's a bit hacky but it would work 16:53:58 pull user profile posts from their API (json, save it), iterate over every available post, save those individual posts (because the individual posts have comments from ALL shared copies of the posts as well), and in turn assuming you're using wget the individual images of each post should be saved as well 16:54:07 each post could be saved into it's own separate folder based on the user and post ID 16:54:15 user/post/post-name 16:54:39 if you're not logged in to a cohost account you may get an error on user accounts that deny any posts being visible to "guests" 16:55:07 fastest way I can see to immediately start archiving stuff ignoring that python setup because I do not know python well enough 16:55:52 iterating over pages basically though. so it's a bit of a crapshoot 16:55:57 you could do i in {1..100} 20:06:16 so, i fear we've missed the mark on twitter, but that doesn't mean it's all over 20:06:16 i suspect many people have kept backups of their messages -- before nuking their accounts, for example. 20:06:16 and many might have kept backups of some conversations, maybe enough to cover the blind spots of whatever the wayback machine managed to scrape off. 20:06:54 about that, the newest snapshots just don't show messages at all... i wonder if it's because the messages are actually missing or it's just the client tripping up 20:09:51 also somewhat ironic, the link in the motd won't work anymore 20:51:18 FavoritoHJS: What mark did we miss? 20:51:29 (I've not been paying attention to twitter for a bit) 20:51:45 (I know Musk is trying his hardest to kill it much like reddit's u/spez) 20:52:04 basically, twitter seems to have made a pretty low limit of maximum tweets viewable, and many people are leaving and nuking their posts 20:52:35 good luck archiving a billion posts when all you can see in a day is a thousand 20:53:35 FavoritoHJS: Where are you pulling the 1b number from? 20:54:01 guesstimate, could be lower, probably is higher 20:54:23 The most recent official number was ~500 million tweets per day. 20:54:29 Yeah 20:54:31 https://www.omnicoreagency.com/twitter-statistics/ 20:54:36 So my guess would be "many billions" 20:54:42 And you're probably off by an order of magnitude, or more 20:55:00 Probably a couple trillion over the lifetime of Twitter. 20:55:17 disgruntled engineer makes an open bucket when? 20:55:23 lol 20:55:28 Hahaha even with that we couldn't do it all 20:55:33 Not in this timeframe 20:55:33 xD true 20:55:50 AT overloading AWS 2: Electric Boogaloo 20:55:53 Also, consider we usually try to get rendered pages, not just the data 20:55:56 😂 20:55:59 Which... yeah 20:56:05 short of a surprice db tape shipment, this is game over 20:56:17 The game never started really. 20:56:24 'an AWS snowmobile appears outside rewby's house' 20:56:26 It's just a ridiculous order of magnitude. 20:56:28 Can't lose a game you never played *taps forehead* 20:56:34 :-) 20:57:02 Fun fact: IA doesn't fit into a Snowmobile anymore. 20:57:10 damn lol 20:57:21 fine fine fine. *2* snowmobiles 20:57:27 :D 20:57:41 A Snowmobile is 100 PB, IA is 135 PiB or something around there. 20:57:42 I don't actually remember the numbers. 20:57:53 rewby goes to get coffee and there's a love note from twittereng 20:58:17 136.5 PiB 20:58:19 Snowmobile. Proving once again that there is nothing in terms of bandwidth like a semi full of drives 20:58:27 yeeeep 20:58:48 1Pbit fibre for everyone when? 20:59:03 Now imagine if it were filled with LTO-8 tape instead. 20:59:21 i like to imagine they're just all loose in a big pile 20:59:40 JAA: Density is actually higher on spinners these days 20:59:57 16TB drives these days, right? 21:00:03 16? Try 20+ 21:00:09 I have a box of 6x20T sat on my desk at work 21:00:26 They're only like 350 each too 21:00:31 (euro) 21:01:15 Seagate will sell you 22T drives for 390 21:02:18 My God. What can we expect in a few years? PB drives? 21:02:47 imagine scrubbing those 21:03:02 Hmm, true. 21:03:07 I would like to point out that with 22T drives, and a 90 bay chassis you could feasably fit 2PB in 4U 21:03:08 Tape would just be cheaper. 21:04:26 I'll give you that 21:04:27 Yeah, if you run 22T drives, you better have some hot spares when you need to rebuild a drive. 21:04:30 But also so much slower 21:04:52 Sure, it'd be 2PB raw 21:04:53 Depending on your benchmark for 'slower'. 21:04:58 But even if you lose 1/5 21:05:06 LTO-9 writes at 400 MB/s. 21:05:17 Latency sucks though, yeah. :-) 21:05:34 But as a 'put the data on a shelf and keep it safe' thing, it's pretty nice. 21:05:35 Combination of latency and the amount of drives needed 21:05:40 Oh definitely 21:05:43 But as an active way of moving data 21:05:47 Eeehhh 21:05:51 Yeah 21:05:57 Like, sure, a hdd is half the speed 21:06:03 But I can write to 90 of them in parallel 21:06:07 In a single 4U chassis 21:06:13 Try that with tape in the same format 21:06:19 format = size envelope 21:06:41 And if we're talking about automation, the density on tape robots is significantly lower than hdds 21:07:17 Right 21:08:52 i could use some 2PB storage at home :3 21:09:08 just need some more funding lol 21:09:44 Supposedly the Twitter rate limits are temporary, though who knows. Not sure if there's a way to get search results unauthed now that you can view a tweet unauthed. It would be cool to have a bot you could sent tweet URLs to like we have with Imgur. 21:10:17 Maybe we can Tweet Musk? 21:10:18 if anyone finds any workarounds pls keep them out of the publicly logged channels and PM arkiver 21:10:25 ^ 21:10:32 razul: Please don't 21:10:40 Oh, LTO-10 is expected this or next year. 36 TB per tape. Neat. 21:10:59 is that real-TB or inflated TB 21:11:09 Real, I ignore the other number. 21:11:12 nice :D 21:11:33 that'd be great for some backups 21:11:39 ofc test your backups etc lol 21:13:57 i wonder what the suggested verification is for tape 21:18:30 JAA: that's a good point, I wonder why snowmobile doesn't use tapes 21:18:48 i have severe doubts about the long-term reliability of >4TiB drives... 21:18:53 I guess they would need too many drives to run in parallel 21:19:34 wait a minute, i recall reading a backblaze blogpost and they weren't too bad... 22:02:41 I wonder if trying to make a tweet backup dump would be worthwhile... 22:39:48 28 tapes per PB. At that point, they run snowmobile out of a hatchback 23:13:26 Working on a web crawling/indexing project - would there be interest in me regularly dumping found mediafire/telegram/etc urls into their respective projects? 23:13:54 DigitalDragon: sure! 23:23:38 JustAnotherArchivist edited List of websites excluded from the Wayback Machine/Partial exclusions (+35, Add https://www.swisstransfer.com/d/): https://wiki.archiveteam.org/?diff=50124&oldid=50036 23:24:38 Exorcism edited Deathwatch (+104): https://wiki.archiveteam.org/?diff=50125&oldid=50121