-
Ryz
Heya folks, what would be any recent Warrior projects that need help that isn't maybe Reddit or Imgur? Because otherwise I'll probably default to one of the two projects I mentioned while running a Warrior, since it's been some time since running one because of weather, ugh
-
fireonlive
i'd say telegram probably
-
nicolas17
Ryz: well if you use the docker containers the answer can be "all of them"
-
nicolas17
reddit and imgur don't have enough tasks or are rate limited -> run both
-
Ryz
Hmm, currently with Imgur, looks like rate-limited, lemme poke into #imgone
-
Ryz
Whoa, checking the to do, 211.87 million right now oO;
-
fireonlive
yee, running it slow
-
Ryz
Oops, that message was supposed to be in that channel
-
flashfire42|m
So ryz I would say just run archiveteam choice if you are running the warrior VM. If you are using docker then configure to run all
-
h2ibot
FireonLive edited Periscope (+24, it's a DPoS):
wiki.archiveteam.org/?diff=50291&oldid=47711
-
h2ibot
-
h2ibot
-
h2ibot
-
bleb
I want to experiment with writing a web archiving tool which runs a page's javascript in the browser, then stores a representation of the resulting DOM.
-
bleb
is there any standard or convention or precedent for this? it seems like archive.today might do something like this but the code is closed
-
bleb
it wouldn't be too hard to write a userscript which sends a json representation of the dom to a local web server to be logged to a file. then you could write a script that takes a URL and points a fresh browser instance at it, then gets the json via a local web server and stores it.
-
bleb
the result would be a lot of json; you could also convert it back to HTML but you might lose some information
-
bleb
I will start playing with this soon but if anyone knows of any precedent let me know
-
sknebel
bleb: there is a library/tool called freeze-dry which tries something similar
-
imer
github.com/internetarchive/brozzler could check what IA's browser based crawler does as well, might be a good starting point?
-
JAA
crocoite did this, but its WARC output has many problems. SingleFile and SingleFileZ also come to mind.
-
JAA
Doing it with brozzler would be nice, yeah.
-
fireonlive
does brozzler have the JAA 🦭 of approval?
-
JAA
fireonlive: I never verified it, but it's IA's work, and the relevant people there seem to care about following standards.
-
fireonlive
sounds good to me :)
-
fireonlive
tks
-
JAA
IA: presumed innocent until proven guilty
-
JAA
webrecorder: presumed guilty until proven innocent
-
JAA
For me personally, anyway. :-)
-
fireonlive
:D
-
fireonlive
given their track records…
-
TheTechRobo
Its not like its undeserved
-
JAA
Yeah, I'm happy to change my stance if they address the issues and show that they care.
-
JAA
And I realise it's open-source and they'd accept PRs, but they're getting paid for this, and they've implemented other things in the meantime, i.e. correctness is clearly not one of their priorities, which immediately disqualifies it.
-
fireonlive
yeah :/. that's a big blight if it's their actual job
-
bleb
brozzler works differently though
-
bleb
it records the network activity with a MITM proxy, so if content is obfuscated with javascript you still need to use a browser when viewing it
-
bleb
ty for all the suggestions :)
-
imer
yes, I was thinking it might be a good starting point since you "just" need to add in saving the page before it's closed
-
imer
all the other automation being in place already pretty much
-
bleb
ya maybe
-
bleb
I tried testing it out a week ago and couldn't get it to run
-
imer
ah
-
bleb
iirc, either a problem with vagrant or a python version issue depending on what I tried
-
bleb
I gotta brush up on perl and start using it instead of python
-
bleb
pip sucks so much and they don't maintain compatibility between minor versions. I asked #debian what would be the easiest way to run some python code written for an earlier minor version of python 3 and they said to use a container or vm
-
imer
bleb: JA_A recommended pyenv to me in a similar situation
-
JAA
Yes, pyenv, and ignore any Python versions installed through the package manager.