-
negativegray
Hi! I'm looking for a specific fanfic but don't really have the disk space or the internet speed to comb the WARC batches for it, is this the proper channel to ask if someone has it in an archive?
-
TheTechRobo
negativegray: which archive are you looking at?
-
TheTechRobo
most official archiveteam stuff is in the wayback machine
-
negativegray
I don't know how to effectively search there, I'm looking at these:
archive.org/details/archiveteam_fanfiction
-
TheTechRobo
From the collection info (
archive.org/details/archiveteam_fanfiction?tab=about): "Fanfiction.net Safety Download <
archive.org/details/fanfic_download_2012_01> is a single 2 GB tar file containing epub files, which may be easier to extract."
-
TheTechRobo
negativegray: ^
-
negativegray
I've checked that, it doesn't have the complete thing I'm pretty sure
-
negativegray
TheTechRobo: or I'm very bad at searching through it
-
TheTechRobo
negativegray: Assuming it's not in the Wayback Machine (that grab was long before I joined AT so I don't know), you can look through the items' CDX
-
TheTechRobo
-
TheTechRobo
Not sure which is the "correct" one but they're a lot smaller than the WARC
-
TheTechRobo
They basically list the WARC's contents, e.g. urls, capture time iirc
-
TheTechRobo
they also list which WARC contains the resource
-
negativegray
TheTechRobo: oooh, thank you! How do I open a cdx?
-
TheTechRobo
negativegray: It's just text, and there's plenty of documentation.
-
TheTechRobo
let me see if I can find some.
-
TheTechRobo
negativegray: Here you go! The first line of CDX is the legend, and it has letters that correspond to what the value is representing. I think it's space separated.
-
TheTechRobo
-
negativegray
TheTechRobo: thank you!
-
TheTechRobo
Not all letters will be present.
-
negativegray
TheTechRobo: I tried reading the .cdx and it did not help me, even with the legend
-
TheTechRobo
Hang on let me download it my internet is slow
-
TheTechRobo
I may have to go before it finishes
-
negativegray
okay!
-
TheTechRobo
negativegray: while it downloads, what information do you have about the fanfic?
-
TheTechRobo
do you have the URL? or do you need a full-text search?
-
TheTechRobo
if the latter, CDX won't work for oyu
-
negativegray
TheTechRobo: yeah I need a full text search. I have the author's name and the fanfic's name
-
negativegray
or title
-
TheTechRobo
In that case, yeah, CDX probably won't help you. :/
-
TheTechRobo
Unless the url contains the title or something.
-
negativegray
yeah
-
negativegray
ty though
-
TheTechRobo
I'm not sure what you can do in that case. Does anybody have the fanfic warcs downloaded?
-
TheTechRobo
I have to go to bed btw, good night!
-
negativegray
good night!
-
Doranwen
negativegray: out of curiosity, what fandom?
-
negativegray
Harry Potter
-
Doranwen
Ah, I wouldn't have it. Could ask some friends of mine, though.
-
Doranwen
We've got a Discord server where we share info on deleted fics we have.
-
negativegray
oh!
-
negativegray
That'd be great!
-
negativegray
It is in portuguese, though
-
negativegray
okay! I got an URL for the author and the fic!
-
negativegray
I can only access the first chapter, though
-
negativegray
gods, being so close hurts. I managed to get to the wayback machine page of the first chapter, but it seems to be the only one that there is on cache
-
schwarzkatz|m
Can you share a link,
-
schwarzkatz|m
Damn it, didn’t mean to send so early.
-
schwarzkatz|m
negativegray: can you share a link please?
-
h2ibot
-
h2ibot
-
JAA
(They left hours ago.)
-
schwarzkatz|m
ah. is that something only admins see?
-
JAA
I have no idea what Matrix does with that information, but on IRC, anyone can see it.
-
schwarzkatz|m
hm, weird.
-
joepie91|m
I've been noticing that parts aren't bridging correctly lately
-
joepie91|m
I suspect a bridge bug
-
Frogging101
Is yt-dlp able to download a YouTube channel that has more videos than the page limit?
-
JTL
can you provide an example channel?
-
Doranwen
schwarzkatz|m: They were looking for
fanfiction.net/s/1888034/1. It's not in the FanficRepack_Redux collection, which a friend of mine suggested looking in.
-
JAA
Doranwen: Do we have any idea when it was deleted?
-
JAA
The WBM snapshot is from 2005.
-
upintheairsheep
Hello, I would like to learn what tool
archive.org/details/TikTok?tab=about is scraped by
-
arkiver
internal, not related to IA
-
upintheairsheep
I know a lot about the comment API and the replies API
-
upintheairsheep
So is the ArchiveTeam not behind it?
-
arkiver
no
-
spirit
NEXT!
-
arkiver
:P
-
upintheairsheep
To remind you, TikTok is going to remove videos related to tanning after warning from medical experts.
theguardian.com/technology/2022/dec…ng-after-alarm-from-medical-experts
-
upintheairsheep
-
upintheairsheep
-
TheTechRobo
How do you reverse engineer the requests that a Steam game makes? I was thinking of a proxy, but as far as I'm aware you can't configure its use.
-
TheTechRobo
Wireshark's fine but it captures ALL traffic...
-
schwarzkatz|m
it has powerful filtering tho
-
TheTechRobo
I don't know how to use it xD
-
TheTechRobo
I might be able to guess at the domain name, though. Is there a way to do that for wireshark?
-
TheTechRobo
Or guess at part of the domain name, at leasty.
-
TheTechRobo
(I know both the company and game name)
-
schwarzkatz|m
related documentation:
-
schwarzkatz|m
-
schwarzkatz|m
-
TheTechRobo
Wireshark also isn't great for HTTP because it just gets the raw TCP data, no? There's likely ssl.
-
schwarzkatz|m
you'd need to use
docs.mitmproxy.org/stable then I guess :D
-
JAA
Depending on how the game validates TLS certs, it might be messy though.
-
JAA
If it has its own cert store or hardcoded fingerprints or similar, for example.
-
JAA
Then you'll need to either replace that (have fun) or use something like tcpdump/Wireshark and extract the master key (also fun).
-
JAA
pre-master key*
-
schwarzkatz|m
if everything goes through mitmproxy though, why would it be messy?
-
JAA
Because the client (game) needs to trust mitmproxy's CA cert for that to work.
-
JAA
If it uses the system trust store, that's easy, but if it doesn't, mess.
-
TheTechRobo
Is there a linux way to get the traffic from a specific process given its PID?
-
JAA
See also: you can't make browsers accept mitmproxy by adding the CA cert to the system trust store. Need to do it separately in the browser.
-
schwarzkatz|m
that... sucks. I thought it was system wide.
-
JAA
TheTechRobo: Maybe some iptables magic would help here, but not sure.
-
JAA
-
sudofox
hiya. i'm looking for some tool recommendations. so i've been trying to archive all static assets from some websites i'm interested in for personal curiosity. i decided to finally give archiving user content from one of them a shot, but it kinda breaks my normal workflow of "try many URLs and git commit whatever i found" due to the sheer # of files
-
sudofox
i've started using git lfs but the reason i'm using git is mainly to actually see how much progress i've made/new things found each time i try something
-
sudofox
i'm wondering if there's a better tool to track progress with recovered files -- i'm also committing tooling for guessing filenames at the same time
-
sudofox
i guess i could use S3 but I still like being able to see what's new with `git status` and so on.
-
sudofox
also git lfs kinda duplicates objects into .git/lfs so double disk space
-
JAA
Yeah, you'll want to get away from 'one file per asset' anyway probably. It just doesn't scale. Eventually, your file system will be sad as well.
-
sudofox
eh, yeah, you're right -- key-based object storage is probably much better for this stuff
-
JAA
One route is WARC, but accessibility isn't exactly great with it.
-
sudofox
i've been thinking about building a little ceph server in my basement for a while for that purpose (instead of using Amazon)
-
JAA
You get extra metadata and a technically more accurate capture that way, too.
-
JAA
I suppose that would work as well, yeah.
-
Doranwen
JAA: No, he never mentioned that. Left his Reddit nick with me but that's all I've got. Oh well, lol.