-
fireonlive
-
fireonlive
-> #stackunderflow
-
h2ibot
Usernam edited List of websites excluded from the Wayback Machine (+31):
wiki.archiveteam.org/?diff=52219&oldid=52216
-
pabs
btw, why was WARC implemented instead of say adding decrypted packet content to network packet capture formats like pcap?
-
OrIdow6
pabs: I've never done anything low-level with TCP or TLS but my impression is that that would be massively more complex to read
-
OrIdow6
For very marginal benefit
-
OrIdow6
It's not very hard to write a WARC parser by hand
-
nulldata
!con 1upc5677t2hep02hvkfxmtkvc 9
-
fireonlive
:3
-
JAA
It's not very hard to write a WARC parser by hand that works for the WARCs produced by most tools.
-
JAA
It's quite another thing to write a parser that parses all valid WARCs correctly.
-
fireonlive
and those quirky invalid ones..
-
h2ibot
JAABot edited List of websites excluded from the Wayback Machine (+0):
wiki.archiveteam.org/?diff=52220&oldid=52219
-
nicolas17
OrIdow6: I think what pabs meant wasn't having to deal with TCP or TLS
-
nicolas17
but "HTTP request" as a pcap packet type
-
OrIdow6
Huh, not familiar with PCAP either, didn't know that was possible
-
OrIdow6
The short answer for why WARC is the way it is is that newline-delineated headers and body is a fairly old and well-established format, for instance
en.wikipedia.org/wiki/Mbox and HTTP look the same
-
OrIdow6
Look similar
-
OrIdow6
WARC's predecessor, ARC, is older than JSON, let alone HAR
fileformats.archiveteam.org/wiki/ARC_(Internet_Archive)
-
OrIdow6
(Though it does look funky)
-
nicolas17
.pcap and .pcapng files *usually* store Ethernet frames
-
nicolas17
but besides that popular use, it can also have any of these
tcpdump.org/linktypes.html
-
nicolas17
pabs: btw TLS keys can be embedded in pcap files to make them decryptable
-
nicolas17
unfortunately (unlike storing decrypted content) that means they remain non-compressible
-
thuban
nicolas17: did you ever write that pcap-to-warc tool you were thinking about?
-
nicolas17
no :/
-
nicolas17
I guess it would be a wireshark/tshark plugin
-
nicolas17
since I'm not gonna write TLS decryption myself
-
OrIdow6
Incidentally on Rust WARC writers, I did write something like that a while ago, it used the "correct" method of producing them (just dumping the TCP/TLS stream to a file) but I didn't test it nearly enough
-
thuban
(istr Sanqui also did some stuff with pcap for discard2)
-
nicolas17
Wireshark can already save HTTP bodies
-
nicolas17
like, if you captured traffic while an application downloaded a file, Wireshark can then extract that file from the packet capture, undoing chunking and compression
-
nicolas17
if *that* is an acceptable feature for Wireshark to have, I don't anticipate opposition to a WARC exporter :P
-
pabs
OrIdow6: I was thinking standard pcap, but yeah I guess excluding the lower layers is one reason
-
flashfire42|m
Air Vanuatu might go under
-
flashfire42|m
Maybe grab the site
-
thuban
cloudflared, with js pow
-
flashfire42|m
Well fuck ok
-
OrIdow6
Thinking about it, pcap -> warc does sound quite nice
-
OrIdow6
Genuine Chrome sessions/whatever
-
OrIdow6
Which is, I assume, the reason people have thought of it in the past
-
h2ibot
-
h2ibot
-
h2ibot
-
h2ibot
-
h2ibot
-
h2ibot
-
h2ibot
-
c3manu
JAA: I was planning on archiving
tube.network.europa.eu which is shutting down on may 18th (see deathwatch). the contents hosted there are apparently mirrored from/to youtube.
-
c3manu
they are 202 videos in total, ranging from 1:30 min animations to some 2h or even 4h interviews
-
c3manu
what do you say: still archive it? archive the youtube channels instead (they have older videos that arent on the peertube instance)? or do the latter and fetch the instance while ignoring the video files?
-
c3manu
i think i'd do the latter, but i'd like to hear your opinion on it :)
-
c3manu
(and anyone else’s, too btw :))
-
» xkey just saw this on Mastodon:
-
xkey
-
xkey
if anyone's looking for a job
-
xkey
-
eightthree
!help
-
eightthree
!con help
-
eightthree
> @OpenArchive⊙ms, a radical archiving organization that is empowering human rights defenders and people in war zones to preserve video evidence
-
eightthree
worded like that, you might end up on a hitlist, of a country at war or organized crime or organized crime paid by a country at war...
-
xkey
true
-
xkey
happend to friends previously working at OCCRP.org
-
JAA
c3manu:
tube.network.europa.eu sounds small enough that we could grab a copy with videos then.
-
c3manu
JAA: okay, thanks :)
-
h2ibot
Exorcism edited Discourse (+53, /* Active Discourses */):
wiki.archiveteam.org/?diff=52228&oldid=52167
-
eggdrop
[tell] icedice: [2024-05-07T20:24:57Z] <thuban> i went through the scanlation discord scrape that Vokun did; have requested it in #//, submitted relevant urls to projects, and checked for custom blogspots (there were none)
-
icedice
Thanks thuban!
-
joepie91|m
random archival-relevant find:
youtube.com/watch?v=oSsZJS26D4E (a version of the soundtrack that few remaining copies exist of afaik)
-
joepie91|m
not sure what the best way is currently to get a copy of this archived
-
that_lurker
arkiver: Would that fit #down-the-tube? ^