Autodesk bought EAGLE in 2017
update on twitter: musk has set a rate limit. all is lost.
yeeep
we're been kvetching about it all day :(
Is the CDN rate limited too?
We might be able to say fuck you and grab the images
wait...
honest question, why did y'all not start yesterday?
the api, does it have rate limits?
flashfire42: See 20:25
flashfire42: where do we get image links?
yes, most users are locked out but if only one was unfortunate enough to get """"verified""""....
Bing. reddit. google, baidu, yandex, URLteam dumps
FavoritoHJS: then that user could abuse their auth token and private API to get... 6000 tweets in a day?
But yeah brute forcing is infeasible
oh, it has rate limiting...
No, it's a whopping 10k now!
(Out of ... 500 million tweets per day?)
if even bots have read limits then all is indeed lost
Wasn't it like 3k per second of tweeting?
...account spam?
kiska: when did you see that? everything changed in the last 24 hours
500 million would be 5.8k/s.
oh you mean new tweets
But who knows what it is now.
600 per user, that would be a million users
so no
archive all of twitter in one day?
FavoritoHJS: bots are asked to pay $40k/mo to do anything remotely heavy-duty
literally impossible haha
using the bot API has *worse* limits than abusing a logged in account
and what are the limits on said "remotely heavy-duty"?
Not very high
welp, better hope for a db leak as i fear we've lost our boat
that would be a massive-ass torrent
FavoritoHJS: the "basic" level costs $100/month and lets you read up to 10000 tweets per MONTH
xD
https://developer.twitter.com/en/portal/petition/essential/basic-info
'ok everyone please seed my 4933PB torrent' Introduce ridiculous API pricing. 2. Everyone starts scraping. 3. Introduce ridiculous view limits. 4. ??? 5. PROFIT, I guess?
yea that would be a problem... maybe not that large once finished by removing the endless spam that twitter never removed, but actually getting to that point?
this leak better include all the hot gay twitter accounts tbh otherwise whats the point
:3
(not that large = probably 4PB or 6 imgurs at least)
If a bunch of AI companies were actually scraping Twitter, hopefully one of them publishes the data.
Hah!
Thinking that twitter is only 4PB big hahah
text only, maybe?
when compressed to hell and back
Google+ was some 1.5 PB and we didn't archive it fully
I guess we could have a bot to let people submit image URLs from their private scrapes.
reddit is about 3PB atm
sorry 3PiB :D
about that scraping the ai scrapers... could work but i'm fairly certain they didn't bother to preserve such minor details as poster or post time
how much is already archived either as wayback machine pages or manual backups?
Tons
is there a way to take out your data from twitter?
also someone please move twitter to Alarm in `Alive... OR ARE THEY`
Assuming it still works: https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive
what does the "and more." at the end contain
sure hope it would be pointers to replies since that could mean known spam accounts can be skipped over
The text content of all tweets alone may well be a PB. And that's before all the metadata around it.
Maybe not quite, but closer than you might think at first.
500 million tweets a day for 17 years at 100 bytes per tweet would be 310 TB.
Now add all the usernames, timestamps, links, and even before touching any images or videos, you're easily in the petabytes.
With media, well...
100 bytes per tweet? OR ARE THEY` 02:02:58 Assuming it still works: https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive 02:03:33 what does the "and more." at the end contain 02:04:08 sure hope it would be pointers to replies since that could mean known spam accounts can be skipped over 02:04:11 The text content of all tweets alone may well be a PB. And that's before all the metadata around it. 02:04:30 Maybe not quite, but closer than you might think at first. 02:05:02 500 million tweets a day for 17 years at 100 bytes per tweet would be 310 TB. 02:05:38 Now add all the usernames, timestamps, links, and even before touching any images or videos, you're easily in the petabytes. 02:05:51 With media, well... 02:07:08 100 bytes per tweet? Propose Banciyuan, Skyblog, Xuite
yts98: we're on it btw with banciyuan
arkiver: do you mean we're working on banciyuan with stwp (that's what i know), or we're cooperating with banciyuan official?
"we" is archiveteam
now that Misty|m is finished completely with their part, we'll kick off the project at AT
sorry for bad grammar comprehension. my "propose" also includes upcoming projects.
no worries
hey, could someone please archive http://www.dlammiehanson.com/ for me?
there's a wix site behind it (http://www.dlammiehanson.wix.com/) but the links from the navigation (targeting the wix site) end up on the correct domain. not sure how this will mess with AB
!a http://www.dlammiehanson.com/ --igset blogs,badvideos -e 'for manu|m'
wrong place lol
it got lost in #archivebot
Add Gfycat to upcoming
Hi, the Japanese video hosting for ASMR zowa.app will send all its content on September 29, 2023 to /dev/null/ (https://note.com/zowa/n/ndf5d4f158589), looking at the ID of the last video there are currently no more than 23589. New content uploads will stop on July 30, looks like a good candidate for archiving Probably we want to wait until new content stops before archiving, but I'll add it to deathwatch
/* 2023 */ ZOWA
if the id's are consecutive numbers we could just go along as new content might be added still
thanks nyuuzyou :)
Just came across the fact that Speedtest.net results are public and sequential. It includes (for the recent ones at least) ping time, up and down speed, and time.
Maybe something worth archiving to get historical data
Goes at least back to 2010
I've been able to go down to 2007, https://www.speedtest.net/result/110000000
However, inexistant IDs seem to trigger 500
is this where twitter archiving is being discussed?
if you have findings about archiving twitter, talk in private to arkiver. Otherwise if it's public knowledge already, go on you're at the right place :) what srot of thing would need to be said in private?
i'm quoting the man: @arkiver | Got interesting information on Twitter? PM me! Do not dump the information in a publicly logged channel.
guest: bypases, methods, etc
Would the team be OK to archive the dataset of the Mersenne primes (https://mersenne.org)?
I was going to ask about ways to save tweets. I was also going post some tools that kinda work; im not the one who found it but i dont know how public knowledge it is already. best to PM arkiver in these trying times
im guessing it should be said in private to prevent people from overusing it?
guest: See the wiki for what is already public knowledge
guest: https://wiki.archiveteam.org/index.php/Twitter
ok my stuff isnt mentioned there
ah ok, pls pm the arkiver in that case
it's not a usual thing but twitter is very… unusual
so
uh
Twitter
is there a dedicated channel for it?
no, just anything about bypasses etc pm to arkiver
if elon makes a tweet saying you need to fart in a jar and mail it to him for 100 more tweet views then here is fine
noted
:)
good thing I saved the profiles of the people I follwed *before* musk entered and musk'ed around
i really shouldve started saving stuff when musk bought twitter
i knew stuff would get messed up but i didnt start till a day ago.
right before the rate limit hit
https://www.youtube.com/watch?v=OuFv1A7fEOE
people already speedrunning ratelimit LMAO
lmao
guest: :-) We did start saving shit before shitstorm happened. You can't rely on a single entity. If you ask my personal opinion this is the prime example where twitter is digging their own grave You can't rely on a single entity. gotta love socialbot/Aramaki :)
this is a blvd for the fediverse
fuzzy8021: Docker's log files would still end up in /var/lib/docker, i.e. on your /.
--log-opt max-size=10m ;)
i really really need to remember to do that
(took me longer than i admit to finally do it as well)
One thing that occurred to me is that there are some big text-only archives like https://archive.org/details/twitterstream so we could complete those by grabbing media links.
And people with their own archives should probably be encouraged to upload them to IA.
add ratelimit information
Switch Not Yet to Upcoming
also this https://archive.org/details/archiveteam_twitter
And anything run through socialbot typically skipped videos.
There are some sites that specifically existed to archive tweets, but i dont remember what they are.
and i imagine they might have trouble saving anything new right now.
archive.today is fucked for twitter atm
https://pbs.twimg.com/media/F0CJ2SEWIAAAO-X?format=jpg&name=orig
nooooo way lol
i hope that's not real
who is gg551015
it's (supposedly) from blind (https://en.wikipedia.org/wiki/Blind_(app)) where you verify with your work email so the twitter logo there verifies they at least (once) worked there
supposedly there's reverifications but not sure how often
any confirmation this is true and not a fabricated image?
tried a bit to find it but no luck so far
(here's an example post where it shows who works where, I guess mobile is different? https://www.teamblind.com/post/Twitter-bug-causes-self-DDOS-tied-to-Elon-Musks-emergency-blocks-and-rate-limits-Its-amateur-hour"-csF0MkWZ )
seems like a lot for logs but i will try limiting them
you can choose whatever you wish i suppose fuzzy8021
i woke up to like a 20GB one one day lol
I found out that adding a forward slash "/" at the end of a url technically counts as a different url on some archiving sites like archive.is (but apparently not with archive.org), so keep that in mind when searching for twitter archives or saving them.
I think it depends on the configuration of the webserver, how it deals with trailing slashes.
yep, technically GET /something and GET /something/ is not the same HTTP request