00:16:30 https://hackaday.com/2023/06/09/they-used-to-be-a-big-shot-now-eagle-is-no-more/ 00:16:53 ^ Autodesk bought EAGLE 00:17:52 in 2017 00:30:08 ah, ooops 00:37:42 Switchnode edited Deathwatch (+116, /* 2023 */ update harvard blogs): https://wiki.archiveteam.org/?diff=50102&oldid=50087 01:43:38 update on twitter: musk has set a rate limit. all is lost. 01:44:01 yeeep 01:44:17 we're been kvetching about it all day :( 01:44:33 Is the CDN rate limited too? 01:44:42 We might be able to say fuck you and grab the images 01:44:45 honest question, why did y'all not start yesterday? 01:44:47 wait... 01:44:59 the api, does it have rate limits? 01:45:00 flashfire42: See 20:25 01:45:01 flashfire42: where do we get image links? 01:45:31 yes, most users are locked out but if only one was unfortunate enough to get """"verified""""....} 01:45:46 Bing. reddit. google, baidu, yandex, URLteam dumps 01:46:06 FavoritoHJS: then that user could abuse their auth token and private API to get... 6000 tweets in a day? 01:46:10 But yeah brute forcing is infeasible 01:46:21 oh, it has rate limiting... 01:46:28 No, it's a whopping 10k now! 01:46:36 (Out of ... 500 million tweets per day?) 01:46:46 if even bots have read limits then all is indeed lost 01:46:53 Wasn't it like 3k per second of tweeting? 01:47:07 ...account spam? 01:47:11 kiska: when did you see that? everything changed in the last 24 hours 01:47:11 500 million would be 5.8k/s. 01:47:21 oh you mean new tweets 01:47:24 But who knows what it is now. 01:47:34 600 per user, that would be a million users 01:47:35 so no 01:47:43 archive all of twitter in one day? 01:47:44 FavoritoHJS: bots are asked to pay $40k/mo to do anything remotely heavy-duty 01:48:07 literally impossible haha 01:48:11 using the bot API has *worse* limits than abusing a logged in account 01:48:12 and what are the limits on said "remotely heavy-duty"? 01:48:32 Not very high 01:48:56 welp, better hope for a db leak as i fear we've lost our boat 01:49:48 that would be a massive-ass torrent 01:49:48 FavoritoHJS: the "basic" level costs $100/month and lets you read up to 10000 tweets per MONTH 01:49:49 xD 01:50:17 https://developer.twitter.com/en/portal/petition/essential/basic-info 01:50:27 'ok everyone please seed my 4933PB torrent' 01:51:14 1. Introduce ridiculous API pricing. 2. Everyone starts scraping. 3. Introduce ridiculous view limits. 4. ??? 5. PROFIT, I guess? 01:51:26 yea that would be a problem... maybe not that large once finished by removing the endless spam that twitter never removed, but actually getting to that point? 01:52:31 this leak better include all the hot gay twitter accounts tbh otherwise whats the point 01:52:35 :3 01:53:23 (not that large = probably 4PB or 6 imgurs at least) 01:53:30 If a bunch of AI companies were actually scraping Twitter, hopefully one of them publishes the data. 01:53:31 Hah! 01:53:46 Thinking that twitter is only 4PB big hahah 01:53:54 text only, maybe? 01:54:01 when compressed to hell and back 01:54:01 Google+ was some 1.5 PB and we didn't archive it fully 01:54:10 I guess we could have a bot to let people submit image URLs from their private scrapes. 01:54:32 reddit is about 3PB atm 01:54:38 sorry 3PiB :D 01:59:14 about that scraping the ai scrapers... could work but i'm fairly certain they didn't bother to preserve such minor details as poster or post time 01:59:56 how much is already archived either as wayback machine pages or manual backups? 02:00:06 Tons 02:00:42 is there a way to take out your data from twitter? 02:02:30 also someone please move twitter to Alarm in `Alive... OR ARE THEY` 02:02:58 Assuming it still works: https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive 02:03:33 what does the "and more." at the end contain 02:04:08 sure hope it would be pointers to replies since that could mean known spam accounts can be skipped over 02:04:11 The text content of all tweets alone may well be a PB. And that's before all the metadata around it. 02:04:30 Maybe not quite, but closer than you might think at first. 02:05:02 500 million tweets a day for 17 years at 100 bytes per tweet would be 310 TB. 02:05:38 Now add all the usernames, timestamps, links, and even before touching any images or videos, you're easily in the petabytes. 02:05:51 With media, well... 02:07:08 100 bytes per tweet? I think you're being optimistic there 02:07:27 i mean, for most of twitter's existence the limit was 140 characters 02:07:37 No idea about the actual average, but yeah, that. 02:07:46 in latin-based alphabets a character takes a single byte 02:07:59 Fermi estimate, mkay? :-) 02:08:08 Ok :D 02:08:13 so 100 bytes seems reasonable for lightly-compressed tweets 02:09:46 are we excluding UTF-8 here 02:10:23 💩 02:11:26 yes, but since ascii characters (which include english and are a significant component of latin-script-based languages) only take a single byte, that's probably fine? 02:12:05 the big problem i think would be to reduce redundancy between tweets while keeping the archive browseable 02:12:49 you could store all the info about each tweet, but that would require storing the username and timestamp and id of each tweet... 02:13:10 but most users would probably tweet multiple times, and probably in bursts 02:15:22 the eggplants must live on! 02:30:27 /!\ 02:30:58 I think something got deleted from the tb2b.eu FTP 02:31:22 wait no, I got a temporary connection error listing one particular subdirectory... retrying 07:44:52 Are there any things that I would probably be unhappy with if I used the `--convert-links` option with wget? 07:44:59 Seems a little weird to rewrite the links in the saved HTML 08:09:03 now i really regret not starting to archive Twitter years ago :/ 09:12:16 Bzc6p edited ЯRUS (+14, не все знают кириллицу): https://wiki.archiveteam.org/?diff=50103&oldid=50095 09:40:58 Gfycat's ending service September 1st 10:46:05 See #deadcat 12:07:28 how does twitter not do ipv6, smh these companies are all useless 13:29:33 someone might want to add gfycat to the channel topic in #archiveteam 14:12:10 Arkiver edited Deathwatch (+554, Add mudrunnermods.com): https://wiki.archiveteam.org/?diff=50104&oldid=50102 14:29:55 has anyone mounted a separate volume to store data on for the docker images of the project? 14:31:02 i am using "--mount type=bind,source=/data2/docker/project,target=/grab/data" and still seeing the main drive increase in size along with the secondary mounted drive 14:31:26 there was some talk about using tmpfs for the grabbers: (quoting J_AA) "With the project images and Docker: `--mount type=tmpfs,tmpfs-size=2G,destination=/grab/data` 14:31:48 this is for youtube so dont have enough ram to hold the videos 14:31:55 ah 14:32:20 my / has 100gb ssd and i have a second rust drive setup for /data2 14:32:52 when using that --mount, i am still seeing / go full even though i am seeing data written to /data2 14:34:51 yeah, not sure. can only see usage in /grab/data here, although im not mounting that to anywhere 14:35:00 are files appearing in the mounted dir? 14:35:14 yes 15:06:19 Yts98 edited Current Projects (+448, Propose Banciyuan, Skyblog, Xuite): https://wiki.archiveteam.org/?diff=50105&oldid=50072 15:12:57 yts98: we're on it btw with banciyuan 15:15:31 arkiver: do you mean we're working on banciyuan with stwp (that's what i know), or we're cooperating with banciyuan official? 15:16:20 "we" is archiveteam 15:16:42 now that Misty|m is finished completely with their part, we'll kick off the project at AT 15:18:27 sorry for bad grammar comprehension. my "propose" also includes upcoming projects. 15:19:36 no worries 15:50:39 hey, could someone please archive http://www.dlammiehanson.com/ for me? 15:50:39 there’s a wix site behind it (http://www.dlammiehanson.wix.com/) but the links from the navigation (targeting the wix site) end up on the correct domain. not sure how this will mess with AB 16:00:10 !a http://www.dlammiehanson.com/ --igset blogs,badvideos -e 'for manu|m' 16:00:14 wrong place lol 16:00:48 it got lost in #archivebot 17:09:46 Wickedplayer494 edited Current Projects (+154, Add Gfycat to upcoming): https://wiki.archiveteam.org/?diff=50106&oldid=50105 17:23:52 Hi, the Japanese video hosting for ASMR zowa.app will send all its content on September 29, 2023 to /dev/null/ (https://note.com/zowa/n/ndf5d4f158589), looking at the ID of the last video there are currently no more than 23589. New content uploads will stop on July 30, looks like a good candidate for archiving 17:26:02 Probably we want to wait until new content stops before archiving, but I'll add it to deathwatch 17:27:50 Pokechu22 edited Deathwatch (+163, /* 2023 */ ZOWA): https://wiki.archiveteam.org/?diff=50107&oldid=50104 17:29:50 if the id's are consecutive numbers we could just go along as new content might be added still 17:31:51 thanks nyuuzyou :) 17:36:30 Just came across the fact that Speedtest.net results are public and sequential. It includes (for the recent ones at least) ping time, up and down speed, and time. 17:39:15 Maybe something worth archiving to get historical data 17:39:26 Goes at least back to 2010 17:41:56 I've been able to go down to 2007, https://www.speedtest.net/result/110000000 17:44:31 However, inexistant IDs seem to trigger 500 18:00:11 is this where twitter archiving is being discussed? 18:01:41 if you have findings about archiving twitter, talk in private to arkiver. Otherwise if it's public knowledge already, go on you're at the right place :) 18:07:50 what srot of thing would need to be said in private? 18:09:29 i'm quoting the man: @arkiver | Got interesting information on Twitter? PM me! Do not dump the information in a publicly logged channel. 18:15:19 guest: bypases, methods, etc 18:15:29 Would the team be OK to archive the dataset of the Mersenne primes (https://mersenne.org)? 18:15:29 I was going to ask about ways to save tweets. I was also going post some tools that kinda work; im not the one who found it but i dont know how public knowledge it is already. 18:15:55 best to PM arkiver in these trying times 18:15:59 im guessing it should be said in private to prevent people from overusing it? 18:16:16 guest: See the wiki for what is already public knowledge 18:16:42 guest: https://wiki.archiveteam.org/index.php/Twitter 18:17:17 ok my stuff isnt mentioned there 18:18:57 ah ok, pls pm the arkiver in that case 18:19:07 it’s not a usual thing but twitter is very… unusual 18:21:04 so 18:21:05 uh 18:21:07 Twitter 18:21:32 is there a dedicated channel for it? 18:22:23 no, just anything about bypasses etc pm to arkiver 18:23:14 if elon makes a tweet saying you need to fart in a jar and mail it to him for 100 more tweet views then here is fine 18:25:18 noted 18:25:35 :) 18:26:25 good thing I saved the profiles of the people I follwed *before* musk entered and musk'ed around 18:27:20 i really shouldve started saving stuff when musk bought twitter 18:27:35 i knew stuff would get messed up but i didnt start till a day ago. 18:27:39 right before the rate limit hit 18:40:33 https://www.youtube.com/watch?v=OuFv1A7fEOE 18:40:40 people already speedrunning ratelimit LMAO 18:52:18 lmao 18:52:32 guest: :-) We did start saving shit before shitstorm happened. You can't rely on a single entity. If you ask my personal opinion this is the prime example where twitter is digging their own grave 18:54:19 gotta love socialbot/Aramaki :) 18:54:26 this is a blvd for the fediverse 19:09:16 fuzzy8021: Docker's log files would still end up in /var/lib/docker, i.e. on your /. 19:13:58 --log-opt max-size=10m ;) 19:20:20 i really really need to remember to do that 19:20:42 (took me longer than i admit to finally do it as well) 19:47:41 One thing that occurred to me is that there are some big text-only archives like https://archive.org/details/twitterstream so we could complete those by grabbing media links. 19:48:18 And people with their own archives should probably be encouraged to upload them to IA. 19:53:18 DigitalDragon edited Twitter (+249, add ratelimit information): https://wiki.archiveteam.org/?diff=50108&oldid=50101 19:53:19 AlbertLarsan68 edited Skyblog (-2, Switch Not Yet to Upcoming): https://wiki.archiveteam.org/?diff=50109&oldid=50003 20:03:31 also this https://archive.org/details/archiveteam_twitter 20:07:38 And anything run through socialbot typically skipped videos. 20:12:28 There are some sites that specifically existed to archive tweets, but i dont remember what they are. 20:12:52 and i imagine they might have trouble saving anything new right now. 20:18:53 archive.today is fucked for twitter atm 20:42:26 https://pbs.twimg.com/media/F0CJ2SEWIAAAO-X?format=jpg&name=orig 20:42:29 nooooo way lol 20:49:36 i hope that's not real 20:50:07 who is gg551015 20:53:49 it's (supposedly) from blind (https://en.wikipedia.org/wiki/Blind_(app)) where you verify with your work email so the twitter logo there verifies they at least (once) worked there 20:54:08 supposedly there's reverifications but not sure how often 20:54:39 any confirmation this is true and not a fabricated image? 20:55:31 tried a bit to find it but no luck so far 20:55:56 (here's an example post where it shows who works where, I guess mobile is different? https://www.teamblind.com/post/Twitter-bug-causes-self-DDOS-tied-to-Elon-Musks-emergency-blocks-and-rate-limits-Its-amateur-hour”-csF0MkWZ ) 20:56:48 seems like a lot for logs but i will try limiting them 21:01:00 you can choose whatever you wish i suppose fuzzy8021 21:01:12 i woke up to like a 20GB one one day lol 21:39:11 I found out that adding a forward slash "/" at the end of a url technically counts as a different url on some archiving sites like archive.is (but apparently not with archive.org), so keep that in mind when searching for twitter archives or saving them. 21:39:44 I think it depends on the configuration of the webserver, how it deals with trailing slashes. 21:54:30 yep, technically GET /something and GET /something/ is not the same HTTP request