-
h2ibot
JustAnotherArchivist edited Deathwatch (-4, /* 2023 */ Definitive deadline for OneHallyu):
wiki.archiveteam.org/?diff=51385&oldid=51384
-
fireonlive
-
JAA
I'm trying to qwarc OneHallyu. It's very slow.
-
JAA
AQNB reopened a few hours ago and will shut down tomorrow (21st). Running through AB now.
-
arkiver
wooh :)
-
JAA
The Analysis & Policy Observatory (APO) managed to secure a partnership and isn't shutting down after all. Would be nice to archive anyway, but not as urgent. (Heavy rate limits and bans thwarted the AB attempts.)
-
JAA
> After taking a well-deserved break, APO will be re-established in the new year. The website will remain open for you to search and browse our policy and research repository and collections.
-
JAA
Whatever 're-established' means exactly.
-
JAA
-
nicolas17
hm I think something about this whole "teraleak" stuff should be mentioned in our TestFlight wiki page
-
nicolas17
in particular since I and others did some indexing of what is in the archived data
-
JAA
Yeah, that would probably be a good idea. It was just public data, as I understand it (this was before my time here).
-
arkiver
it was just a regular project
-
arkiver
like all other projects
-
fireonlive
i did see arkiver2 as the committer :3
-
JAA
It wasn't even a listable S3 bucket probably, right? Just S3 URLs referenced on the website?
-
nicolas17
I don't know how discovery worked
-
arkiver
the tons of attention and "teraleak" branding is mostly people who just found out the "web archiving is pretty cool, because it saves stuff, like games, that are later maybe not available"
-
arkiver
found out that*
-
arkiver
remember for us it's pretty normal and obvious
-
arkiver
for many non-tech people out there it's a first time they really see something like this
-
JAA
Yeah, but it might be worth clarifying that we're archiving public data, not hacking into private systems or whatever.
-
nicolas17
-
arkiver
JAA: yeah!
-
fireonlive
someone called it 'leak' initially as well, which caught on and wasn't helpful...
-
arkiver
the code is completely public though, people can see there's no hack-y stuff in there.
-
fireonlive
yep :)
-
nicolas17
-
arkiver
fireonlive: yeah honestly sounds like Discord shouting
-
nicolas17
which redirected to cloudfront or s3
-
fireonlive
arkiver: indeed
-
arkiver
i'm very happy though nicolas17 is making good use of this, and we're getting attention :)
-
nicolas17
arkiver: Jason Scott went into the Discord to clarify things
-
arkiver
although in a different way would have been better
-
arkiver
nicolas17: yeah
-
JAA
arkiver: Assuming people (a) take the time to look for the code, (b) read the code, and (c) understand the code. :-)
-
nicolas17
I know someone who plans to do some analysis on the executables
-
arkiver
JAA: yeah but if any "official bodies" get involved or look further into this, they'll find the code and how it's all working
-
JAA
Right
-
arkiver
just an Archive Team project like all others :)
-
arkiver
9 years ago
-
arkiver
also this month i'm 10 years with Archive Team!
-
JAA
\o/
-
fireonlive
:D
-
nicolas17
she works at a company making a decompiler and I think other reverse engineering tools, so 70000 real-world iOS binaries is a goldmine of test cases
-
arkiver
nicolas17: yeah :)
-
nicolas17
arkiver: also for people who want to do stuff on the whole dataset, they're like "wait what is this warc thing"
-
nicolas17
"you mean I don't have to do 70k requests to the slow web.archive.org hostname to download each individual file?"
-
JAA
'I can download over a terabyte at 5 kB/s instead? Amazing!' :-P
-
JAA
But yeah
-
nicolas17
some people who have the storage quickly found the torrents
-
JAA
Right
-
JAA
Are these torrents complete?
-
arkiver
there are torrents?
-
arkiver
of testflight
-
nicolas17
I don't have the storage, so I'm piping wget into an "extract what I need and throw it away" script :P
-
JAA
arkiver: We only set noarchivetorrent since a couple years ago, so I'd expect there to be torrents.
-
nicolas17
archive.org's autogenerated torrents work fine, the warcs are max 50GB
-
fireonlive
everyone should get a free 1PiB minimum :(
-
arkiver
JAA: ah
-
fireonlive
as a human right!
-
nicolas17
although they have the usual problems of IA torrents
-
arkiver
nicolas17: what are those problems?
-
nicolas17
textfiles edited the description on the items to clarify where they came from, and added a preview image
-
nicolas17
which re-generated the torrents
-
nicolas17
so now the old ones don't work anymore, or at least don't exchange data with people who got the new ones :P
-
arkiver
i don't know if we needed that image
-
arkiver
on all items
-
nicolas17
yeah that was questionable, but I think the xml file with the metadata (including description) is in the torrent too
-
JAA
I don't think we needed it on any item.
-
arkiver
JAA: yeah
-
arkiver
just the logo maybe on the collection
-
nicolas17
so image or not, editing the description would invalidate the torrent anyway
-
fireonlive
:(
-
JAA
Collections can have images, I think? That could've been the appropriate place.
-
» nicolas17 beds
-
JAA
And a link to the wiki page there would've been good, too.
-
fireonlive
collection description maybe? hm.
-
fireonlive
cu nicky
-
fireonlive
-
fireonlive
master tapes for βrebootβ have been found
-
flashfire42
Archiveteam wiki down?
-
fireonlive
indee
-
fireonlive
d
-
angenieux
Is it because of the TestFlight "leak" driving traffic to the website?
-
fireonlive
unsure; another AT wiki went down too
-
angenieux
-
fireonlive
indee
-
fireonlive
..d
-
angenieux
i see
-
immibis
Times of Israel should probably get archived, yeah? it has strict cloudflare in front of it so i bet generic efforts didn't get it
-
datechnoman
Quick question for the mind hive. Playing around with the ia command line tool for the first time. How do I set the period when the files were uploaded/published to download? I am trying to pull all the cdx.gz files for a given period and then process them with the cdxsummary tool. The command I am using is "ia search
-
datechnoman
'collection:archiveteam_telegram' --itemlist | xargs -r -n 5 ia download --glob '*.cdx.gz'". I assumed there would be a switch such as --date but that does not seem to be the case. If there is a better way to do this please do share. Thanks in advance!
-
datechnoman
Maybe I could use identifiers or something?
-
datechnoman
Would also like a way to export all of the cdx.gz download links for that period as I can create a script to run them through the cdxsummary tool
-
datechnoman
-
magmaus3
out of curiosity, does #down-the-tube require any special permissions to use the bot?
-
Pedrosso
checking #down-the-tube james without mod or voice could do so
-
nicolas17
magmaus3: technically, no permissions needed
-
nicolas17
but if you're unsure if some video/channel fits in the archival scope as documented in the wiki, ask an op to approve it before submitting
-
JAA
datechnoman: `ia search 'collection:archiveteam_telegram addeddate:[2023-12-01 TO 2023-12-20]' ...` for items created on those days. There's also `publicdate` (exact details of how that's set are unclear to me), and `oai_updatedate` allows to find items that had their most recent changes in some time window. You can use `null` instead of a date to make it an open range search.
-
JAA
A wild sketchy cow appeared.
-
nicolas17
but you didn't catch it fast enough
-
nulldata
Quick, catch him now!
-
JAA
:-)
-
JAA
My OneHallyu grab has been running for a while now. It looks like it might be tight. Their server is very slow; I'm getting 4.5 to 6 seconds average response time. Hitting it even harder is unlikely to help.
-
JAA
Current ETA is just under 5 days. They're shutting down on the 25th...
-
ScenarioPlanet
Is that outline of AT Wiki planned?
-
JAA
Outage, you mean?
-
ScenarioPlanet
503 error
-
JAA
Not planned and being worked on as mentioned in the #archiveteam topic.
-
SketchCow
Totally planned.
-
SketchCow
We never miss
-
nulldata
It's a planned unplanned outage.
-
SketchCow
We're ArchiveTeam, we always work with the assumption everything dies and goes down
-
SketchCow
Nothing surprises us.
-
nulldata
The wiki is moving to Fandom so you can enjoy McDonald's ads and random unrelated gameplay videos along side your archival information!
-
TheTechRobo
All the information from the wiki is now available on our Discord server
-
nulldata
Wiki is back. Quick, someone make an 'ansaleak' Twitter account for Yahoo Answers!
-
SketchCow
So, while I'm here, any other issues I need to be aware of?
-
SketchCow
I haven't abandoned you kids, I just went out for a pack of cigarettes
-
SketchCow
That should be my title: Archive Team Co-Founder, Went Out For Pack of Cigarettes
-
murb
missing, presumed smoked?
-
fireonlive
i like this new title
-
that_lurker
theres a discord server
-
that_lurker
π
-
fireonlive
absolutely not
-
JAA
No
-
that_lurker
-
nulldata
\ msg fire *phew* that was a close one - lurker almost found out about the secret AT Discord server. See you in vc!
-
nulldata
oh shit
-
flashfire42
Discord Server?
-
flashfire42
*Makes backups of emojis from steam giveaway server and leaves*
-
flashfire42
Now I am ready
-
fireonlive
π
-
fireonlive
damn it null
-
Barto
-
h2ibot
Flashfire42 edited List of websites excluded from the Wayback Machine/Partial exclusions (+32):
wiki.archiveteam.org/?diff=51386&oldid=51376
-
h2ibot
Flashfire42 edited List of websites excluded from the Wayback Machine/Partial exclusions (+31):
wiki.archiveteam.org/?diff=51387&oldid=51386
-
h2ibot
Flashfire42 edited List of websites excluded from the Wayback Machine (+23):
wiki.archiveteam.org/?diff=51388&oldid=51361
-
nicolas17
checking Safari Tech Preview links against WBM now...
-
nicolas17
95 versions to go, and I started to get rate limited by WBM
-
JAA
nicolas17: Are you checking for truncation, too, or are these things below 2 GiB anyway?
-
nicolas17
for Safari I found one truncated yes
-
nicolas17
and included it in my list
-
JAA
Ok, good :-)
-
nicolas17
the headers were weird too
-
nicolas17
HTTP/2 200 content-length: 1048576 x-archive-orig-x-crawler-content-length: 19521864 x-archive-orig-content-length: 1048576
-
JAA
Beautiful
-
datechnoman
JAA thank you very much for that information. Really appreciate it :)
-
nicolas17
-
nicolas17
-
h2ibot
JustAnotherArchivist edited Deathwatch (+287, /* 2023 */ Add Inside Imaging):
wiki.archiveteam.org/?diff=51389&oldid=51385
-
h2ibot
JAABot edited List of websites excluded from the Wayback Machine (+0):
wiki.archiveteam.org/?diff=51390&oldid=51388
-
JAA
OneHallyu seems to be getting slower. I'm seeing an average response time of over 6 seconds now. ETA: not in time
-
nicolas17
other people archiving maybe? D:
-
JAA
Perhaps