03:45:52 <Ryz> Heya folks, what would be any recent Warrior projects that need help that isn't maybe Reddit or Imgur? Because otherwise I'll probably default to one of the two projects I mentioned while running a Warrior, since it's been some time since running one because of weather, ugh
03:46:44 <fireonlive> i'd say telegram probably
03:47:11 <nicolas17> Ryz: well if you use the docker containers the answer can be "all of them"
03:47:30 <nicolas17> reddit and imgur don't have enough tasks or are rate limited -> run both
03:48:15 <Ryz> Hmm, currently with Imgur, looks like rate-limited, lemme poke into #imgone
03:50:11 <Ryz> Whoa, checking the to do, 211.87 million right now oO;
03:50:36 <fireonlive> yee, running it slow
03:50:58 <Ryz> Oops, that message was supposed to be in that channel
04:23:14 <flashfire42|m> So ryz I would say just run archiveteam choice if you are running the warrior VM. If you are using docker then configure to run all
06:24:57 <h2ibot> FireonLive edited Periscope (+24, it's a DPoS): https://wiki.archiveteam.org/?diff=50291&oldid=47711
12:52:35 <h2ibot> Exorcism uploaded File:Bandcamp-logo.png: https://wiki.archiveteam.org/?title=File%3ABandcamp-logo.png
12:53:35 <h2ibot> Exorcism uploaded File:Bandcamp-screenshot.png: https://wiki.archiveteam.org/?title=File%3ABandcamp-screenshot.png
12:53:36 <h2ibot> Exorcism edited Bandcamp (+40): https://wiki.archiveteam.org/?diff=50294&oldid=48993
14:21:11 <bleb> I want to experiment with writing a web archiving tool which runs a page's javascript in the browser, then stores a representation of the resulting DOM.
14:21:45 <bleb> is there any standard or convention or precedent for this?  it seems like archive.today might do something like this but the code is closed
14:24:57 <bleb> it wouldn't be too hard to write a userscript which sends a json representation of the dom to a local web server to be logged to a file.  then you could write a script that takes a URL and points a fresh browser instance at it, then gets the json via a local web server and stores it.
14:25:58 <bleb> the result would be a lot of json; you could also convert it back to HTML but you might lose some information
14:26:37 <bleb> I will start playing with this soon but if anyone knows of any precedent let me know
14:50:31 <sknebel> bleb: there is a library/tool called freeze-dry which tries something similar
14:57:20 <imer> https://github.com/internetarchive/brozzler could check what IA's browser based crawler does as well, might be a good starting point?
14:57:43 <JAA> crocoite did this, but its WARC output has many problems. SingleFile and SingleFileZ also come to mind.
14:58:03 <JAA> Doing it with brozzler would be nice, yeah.
15:50:24 <fireonlive> does brozzler have the JAA 🦭 of approval?
16:00:12 <JAA> fireonlive: I never verified it, but it's IA's work, and the relevant people there seem to care about following standards.
16:00:36 <fireonlive> sounds good to me :)
16:00:42 <fireonlive> tks
16:07:11 <JAA> IA: presumed innocent until proven guilty
16:07:18 <JAA> webrecorder: presumed guilty until proven innocent
16:07:24 <JAA> For me personally, anyway. :-)
16:07:55 <fireonlive> :D
16:08:12 <fireonlive> given their track records…
16:08:23 <TheTechRobo> Its not like its undeserved
16:09:28 <JAA> Yeah, I'm happy to change my stance if they address the issues and show that they care.
16:11:09 <JAA> And I realise it's open-source and they'd accept PRs, but they're getting paid for this, and they've implemented other things in the meantime, i.e. correctness is clearly not one of their priorities, which immediately disqualifies it.
16:15:05 <fireonlive> yeah :/. that's a big blight if it's their actual job
17:43:40 <bleb> brozzler works differently though
17:44:34 <bleb> it records the network activity with a MITM proxy, so if content is obfuscated with javascript you still need to use a browser when viewing it
17:44:47 <bleb> ty for all the suggestions :)
17:45:06 <imer> yes, I was thinking it might be a good starting point since you "just" need to add in saving the page before it's closed
17:45:24 <imer> all the other automation being in place already pretty much
17:46:14 <bleb> ya maybe
17:46:30 <bleb> I tried testing it out a week ago and couldn't get it to run
17:46:37 <imer> ah
17:47:13 <bleb> iirc, either a problem with vagrant or a python version issue depending on what I tried
17:48:25 <bleb> I gotta brush up on perl and start using it instead of python
17:49:42 <bleb> pip sucks so much and they don't maintain compatibility between minor versions.  I asked #debian what would be the easiest way to run some python code written for an earlier minor version of python 3 and they said to use a container or vm
17:57:09 <imer> bleb: JA_A recommended pyenv to me in a similar situation
17:57:39 <JAA> Yes, pyenv, and ignore any Python versions installed through the package manager.