-
pabs
-
pabs
^ Autodesk bought EAGLE
-
nicolas17
in 2017
-
pabs
ah, ooops
-
h2ibot
Switchnode edited Deathwatch (+116, /* 2023 */ update harvard blogs):
wiki.archiveteam.org/?diff=50102&oldid=50087
-
FavoritoHJS
update on twitter: musk has set a rate limit. all is lost.
-
fireonlive
yeeep
-
fireonlive
we're been kvetching about it all day :(
-
flashfire42
Is the CDN rate limited too?
-
flashfire42
We might be able to say fuck you and grab the images
-
FavoritoHJS
honest question, why did y'all not start yesterday?
-
FavoritoHJS
wait...
-
FavoritoHJS
the api, does it have rate limits?
-
JAA
flashfire42: See 20:25
-
nicolas17
flashfire42: where do we get image links?
-
FavoritoHJS
yes, most users are locked out but if only one was unfortunate enough to get """"verified""""....}
-
flashfire42
Bing. reddit. google, baidu, yandex, URLteam dumps
-
nicolas17
FavoritoHJS: then that user could abuse their auth token and private API to get... 6000 tweets in a day?
-
flashfire42
But yeah brute forcing is infeasible
-
FavoritoHJS
oh, it has rate limiting...
-
JAA
No, it's a whopping 10k now!
-
JAA
(Out of ... 500 million tweets per day?)
-
FavoritoHJS
if even bots have read limits then all is indeed lost
-
kiska
Wasn't it like 3k per second of tweeting?
-
FavoritoHJS
...account spam?
-
nicolas17
kiska: when did you see that? everything changed in the last 24 hours
-
JAA
500 million would be 5.8k/s.
-
nicolas17
oh you mean new tweets
-
JAA
But who knows what it is now.
-
FavoritoHJS
600 per user, that would be a million users
-
FavoritoHJS
so no
-
fireonlive
archive all of twitter in one day?
-
nicolas17
FavoritoHJS: bots are asked to pay $40k/mo to do anything remotely heavy-duty
-
fireonlive
literally impossible haha
-
nicolas17
using the bot API has *worse* limits than abusing a logged in account
-
FavoritoHJS
and what are the limits on said "remotely heavy-duty"?
-
kiska
Not very high
-
FavoritoHJS
welp, better hope for a db leak as i fear we've lost our boat
-
fireonlive
that would be a massive-ass torrent
-
nicolas17
FavoritoHJS: the "basic" level costs $100/month and lets you read up to 10000 tweets per MONTH
-
fireonlive
xD
-
nicolas17
-
fireonlive
'ok everyone please seed my 4933PB torrent'
-
JAA
1. Introduce ridiculous API pricing. 2. Everyone starts scraping. 3. Introduce ridiculous view limits. 4. ??? 5. PROFIT, I guess?
-
FavoritoHJS
yea that would be a problem... maybe not that large once finished by removing the endless spam that twitter never removed, but actually getting to that point?
-
fireonlive
this leak better include all the hot gay twitter accounts tbh otherwise whats the point
-
fireonlive
:3
-
FavoritoHJS
(not that large = probably 4PB or 6 imgurs at least)
-
lennier1
If a bunch of AI companies were actually scraping Twitter, hopefully one of them publishes the data.
-
kiska
Hah!
-
kiska
Thinking that twitter is only 4PB big hahah
-
FavoritoHJS
text only, maybe?
-
FavoritoHJS
when compressed to hell and back
-
kiska
Google+ was some 1.5 PB and we didn't archive it fully
-
lennier1
I guess we could have a bot to let people submit image URLs from their private scrapes.
-
fireonlive
reddit is about 3PB atm
-
fireonlive
sorry 3PiB :D
-
FavoritoHJS
about that scraping the ai scrapers... could work but i'm fairly certain they didn't bother to preserve such minor details as poster or post time
-
FavoritoHJS
how much is already archived either as wayback machine pages or manual backups?
-
kiska
Tons
-
FavoritoHJS
is there a way to take out your data from twitter?
-
FavoritoHJS
also someone please move twitter to Alarm in `Alive... OR ARE THEY`
-
lennier1
-
FavoritoHJS
what does the "and more." at the end contain
-
FavoritoHJS
sure hope it would be pointers to replies since that could mean known spam accounts can be skipped over
-
JAA
The text content of all tweets alone may well be a PB. And that's before all the metadata around it.
-
JAA
Maybe not quite, but closer than you might think at first.
-
JAA
500 million tweets a day for 17 years at 100 bytes per tweet would be 310 TB.
-
JAA
Now add all the usernames, timestamps, links, and even before touching any images or videos, you're easily in the petabytes.
-
JAA
With media, well...
-
kiska
100 bytes per tweet? I think you're being optimistic there
-
FavoritoHJS
i mean, for most of twitter's existence the limit was 140 characters
-
JAA
No idea about the actual average, but yeah, that.
-
FavoritoHJS
in latin-based alphabets a character takes a single byte
-
JAA
Fermi estimate, mkay? :-)
-
kiska
Ok :D
-
FavoritoHJS
so 100 bytes seems reasonable for lightly-compressed tweets
-
fireonlive
are we excluding UTF-8 here
-
JAA
💩
-
FavoritoHJS
yes, but since ascii characters (which include english and are a significant component of latin-script-based languages) only take a single byte, that's probably fine?
-
FavoritoHJS
the big problem i think would be to reduce redundancy between tweets while keeping the archive browseable
-
FavoritoHJS
you could store all the info about each tweet, but that would require storing the username and timestamp and id of each tweet...
-
FavoritoHJS
but most users would probably tweet multiple times, and probably in bursts
-
fireonlive
the eggplants must live on!
-
nicolas17
/!\
-
nicolas17
I think something got deleted from the tb2b.eu FTP
-
nicolas17
wait no, I got a temporary connection error listing one particular subdirectory... retrying
-
cbrts
Are there any things that I would probably be unhappy with if I used the `--convert-links` option with wget?
-
cbrts
Seems a little weird to rewrite the links in the saved HTML
-
Terbium
now i really regret not starting to archive Twitter years ago :/
-
h2ibot
Bzc6p edited ЯRUS (+14, не все знают кириллицу):
wiki.archiveteam.org/?diff=50103&oldid=50095
-
ram|m
Gfycat's ending service September 1st
-
razul
See #deadcat
-
imer
how does twitter not do ipv6, smh these companies are all useless
-
thuban
someone might want to add gfycat to the channel topic in #archiveteam
-
h2ibot
Arkiver edited Deathwatch (+554, Add mudrunnermods.com):
wiki.archiveteam.org/?diff=50104&oldid=50102
-
fuzzy8021
has anyone mounted a separate volume to store data on for the docker images of the project?
-
fuzzy8021
i am using "--mount type=bind,source=/data2/docker/project,target=/grab/data" and still seeing the main drive increase in size along with the secondary mounted drive
-
imer
there was some talk about using tmpfs for the grabbers: (quoting J_AA) "With the project images and Docker: `--mount type=tmpfs,tmpfs-size=2G,destination=/grab/data`
-
fuzzy8021
this is for youtube so dont have enough ram to hold the videos
-
imer
ah
-
fuzzy8021
my / has 100gb ssd and i have a second rust drive setup for /data2
-
fuzzy8021
when using that --mount, i am still seeing / go full even though i am seeing data written to /data2
-
imer
yeah, not sure. can only see usage in /grab/data here, although im not mounting that to anywhere
-
imer
are files appearing in the mounted dir?
-
fuzzy8021
yes
-
h2ibot
Yts98 edited Current Projects (+448, Propose Banciyuan, Skyblog, Xuite):
wiki.archiveteam.org/?diff=50105&oldid=50072
-
arkiver
yts98: we're on it btw with banciyuan
-
yts98
arkiver: do you mean we're working on banciyuan with stwp (that's what i know), or we're cooperating with banciyuan official?
-
arkiver
"we" is archiveteam
-
arkiver
now that Misty|m is finished completely with their part, we'll kick off the project at AT
-
yts98
sorry for bad grammar comprehension. my "propose" also includes upcoming projects.
-
arkiver
no worries
-
manu|m
hey, could someone please archive
dlammiehanson.com for me?
-
manu|m
there’s a wix site behind it (
dlammiehanson.wix.com) but the links from the navigation (targeting the wix site) end up on the correct domain. not sure how this will mess with AB
-
Barto
!a
dlammiehanson.com --igset blogs,badvideos -e 'for manu|m'
-
Barto
wrong place lol
-
manu|m
it got lost in #archivebot
-
h2ibot
Wickedplayer494 edited Current Projects (+154, Add Gfycat to upcoming):
wiki.archiveteam.org/?diff=50106&oldid=50105
-
nyuuzyou
Hi, the Japanese video hosting for ASMR zowa.app will send all its content on September 29, 2023 to /dev/null/ (
note.com/zowa/n/ndf5d4f158589), looking at the ID of the last video there are currently no more than 23589. New content uploads will stop on July 30, looks like a good candidate for archiving
-
pokechu22
Probably we want to wait until new content stops before archiving, but I'll add it to deathwatch
-
h2ibot
Pokechu22 edited Deathwatch (+163, /* 2023 */ ZOWA):
wiki.archiveteam.org/?diff=50107&oldid=50104
-
manu|m
if the id's are consecutive numbers we could just go along as new content might be added still
-
arkiver
thanks nyuuzyou :)
-
albertlarsan68
Just came across the fact that Speedtest.net results are public and sequential. It includes (for the recent ones at least) ping time, up and down speed, and time.
-
albertlarsan68
Maybe something worth archiving to get historical data
-
albertlarsan68
Goes at least back to 2010
-
albertlarsan68
I've been able to go down to 2007,
speedtest.net/result/110000000
-
albertlarsan68
However, inexistant IDs seem to trigger 500
-
guest
is this where twitter archiving is being discussed?
-
Barto
if you have findings about archiving twitter, talk in private to arkiver. Otherwise if it's public knowledge already, go on you're at the right place :)
-
guest
what srot of thing would need to be said in private?
-
Barto
i'm quoting the man: @arkiver | Got interesting information on Twitter? PM me! Do not dump the information in a publicly logged channel.
-
fireonlive
guest: bypases, methods, etc
-
albertlarsan68
Would the team be OK to archive the dataset of the Mersenne primes (
mersenne.org)?
-
guest
I was going to ask about ways to save tweets. I was also going post some tools that kinda work; im not the one who found it but i dont know how public knowledge it is already.
-
fireonlive
best to PM arkiver in these trying times
-
guest
im guessing it should be said in private to prevent people from overusing it?
-
albertlarsan68
guest: See the wiki for what is already public knowledge
-
albertlarsan68
-
guest
ok my stuff isnt mentioned there
-
fireonlive
ah ok, pls pm the arkiver in that case
-
fireonlive
it’s not a usual thing but twitter is very… unusual
-
Ivan226
so
-
Ivan226
uh
-
Ivan226
Twitter
-
Ivan226
is there a dedicated channel for it?
-
fireonlive
no, just anything about bypasses etc pm to arkiver
-
fireonlive
if elon makes a tweet saying you need to fart in a jar and mail it to him for 100 more tweet views then here is fine
-
Ivan226
noted
-
fireonlive
:)
-
Ivan226
good thing I saved the profiles of the people I follwed *before* musk entered and musk'ed around
-
guest
i really shouldve started saving stuff when musk bought twitter
-
guest
i knew stuff would get messed up but i didnt start till a day ago.
-
guest
right before the rate limit hit
-
Ivan226
-
Ivan226
people already speedrunning ratelimit LMAO
-
fireonlive
lmao
-
Barto
guest: :-) We did start saving shit before shitstorm happened. You can't rely on a single entity. If you ask my personal opinion this is the prime example where twitter is digging their own grave
-
fireonlive
gotta love socialbot/Aramaki :)
-
Barto
this is a blvd for the fediverse
-
JAA
fuzzy8021: Docker's log files would still end up in /var/lib/docker, i.e. on your /.
-
Barto
--log-opt max-size=10m ;)
-
fireonlive
i really really need to remember to do that
-
Barto
(took me longer than i admit to finally do it as well)
-
lennier1
One thing that occurred to me is that there are some big text-only archives like
archive.org/details/twitterstream so we could complete those by grabbing media links.
-
lennier1
And people with their own archives should probably be encouraged to upload them to IA.
-
h2ibot
DigitalDragon edited Twitter (+249, add ratelimit information):
wiki.archiveteam.org/?diff=50108&oldid=50101
-
h2ibot
AlbertLarsan68 edited Skyblog (-2, Switch Not Yet to Upcoming):
wiki.archiveteam.org/?diff=50109&oldid=50003
-
tzt
-
lennier1
And anything run through socialbot typically skipped videos.
-
guest
There are some sites that specifically existed to archive tweets, but i dont remember what they are.
-
guest
and i imagine they might have trouble saving anything new right now.
-
fireonlive
archive.today is fucked for twitter atm
-
fireonlive
-
fireonlive
nooooo way lol
-
Terbium
i hope that's not real
-
arkiver
who is gg551015
-
fireonlive
it's (supposedly) from blind (
en.wikipedia.org/wiki/Blind_(app)) where you verify with your work email so the twitter logo there verifies they at least (once) worked there
-
fireonlive
supposedly there's reverifications but not sure how often
-
arkiver
any confirmation this is true and not a fabricated image?
-
fireonlive
tried a bit to find it but no luck so far
-
fireonlive
(here's an example post where it shows who works where, I guess mobile is different?
teamblind.com/post/Twitter-bug-caus…ks-and-rate-limits-Its-amateur-hour”-csF0MkWZ )
-
fuzzy8021
seems like a lot for logs but i will try limiting them
-
fireonlive
you can choose whatever you wish i suppose fuzzy8021
-
fireonlive
i woke up to like a 20GB one one day lol
-
guest
I found out that adding a forward slash "/" at the end of a url technically counts as a different url on some archiving sites like archive.is (but apparently not with archive.org), so keep that in mind when searching for twitter archives or saving them.
-
razul
I think it depends on the configuration of the webserver, how it deals with trailing slashes.
-
Barto
yep, technically GET /something and GET /something/ is not the same HTTP request