03:45:52 Heya folks, what would be any recent Warrior projects that need help that isn't maybe Reddit or Imgur? Because otherwise I'll probably default to one of the two projects I mentioned while running a Warrior, since it's been some time since running one because of weather, ugh 03:46:44 i'd say telegram probably 03:47:11 Ryz: well if you use the docker containers the answer can be "all of them" 03:47:30 reddit and imgur don't have enough tasks or are rate limited -> run both 03:48:15 Hmm, currently with Imgur, looks like rate-limited, lemme poke into #imgone 03:50:11 Whoa, checking the to do, 211.87 million right now oO; 03:50:36 yee, running it slow 03:50:58 Oops, that message was supposed to be in that channel 04:23:14 So ryz I would say just run archiveteam choice if you are running the warrior VM. If you are using docker then configure to run all 06:24:57 FireonLive edited Periscope (+24, it's a DPoS): https://wiki.archiveteam.org/?diff=50291&oldid=47711 12:52:35 Exorcism uploaded File:Bandcamp-logo.png: https://wiki.archiveteam.org/?title=File%3ABandcamp-logo.png 12:53:35 Exorcism uploaded File:Bandcamp-screenshot.png: https://wiki.archiveteam.org/?title=File%3ABandcamp-screenshot.png 12:53:36 Exorcism edited Bandcamp (+40): https://wiki.archiveteam.org/?diff=50294&oldid=48993 14:21:11 I want to experiment with writing a web archiving tool which runs a page's javascript in the browser, then stores a representation of the resulting DOM. 14:21:45 is there any standard or convention or precedent for this? it seems like archive.today might do something like this but the code is closed 14:24:57 it wouldn't be too hard to write a userscript which sends a json representation of the dom to a local web server to be logged to a file. then you could write a script that takes a URL and points a fresh browser instance at it, then gets the json via a local web server and stores it. 14:25:58 the result would be a lot of json; you could also convert it back to HTML but you might lose some information 14:26:37 I will start playing with this soon but if anyone knows of any precedent let me know 14:50:31 bleb: there is a library/tool called freeze-dry which tries something similar 14:57:20 https://github.com/internetarchive/brozzler could check what IA's browser based crawler does as well, might be a good starting point? 14:57:43 crocoite did this, but its WARC output has many problems. SingleFile and SingleFileZ also come to mind. 14:58:03 Doing it with brozzler would be nice, yeah. 15:50:24 does brozzler have the JAA šŸ¦­ of approval? 16:00:12 fireonlive: I never verified it, but it's IA's work, and the relevant people there seem to care about following standards. 16:00:36 sounds good to me :) 16:00:42 tks 16:07:11 IA: presumed innocent until proven guilty 16:07:18 webrecorder: presumed guilty until proven innocent 16:07:24 For me personally, anyway. :-) 16:07:55 :D 16:08:12 given their track recordsā€¦ 16:08:23 Its not like its undeserved 16:09:28 Yeah, I'm happy to change my stance if they address the issues and show that they care. 16:11:09 And I realise it's open-source and they'd accept PRs, but they're getting paid for this, and they've implemented other things in the meantime, i.e. correctness is clearly not one of their priorities, which immediately disqualifies it. 16:15:05 yeah :/. that's a big blight if it's their actual job 17:43:40 brozzler works differently though 17:44:34 it records the network activity with a MITM proxy, so if content is obfuscated with javascript you still need to use a browser when viewing it 17:44:47 ty for all the suggestions :) 17:45:06 yes, I was thinking it might be a good starting point since you "just" need to add in saving the page before it's closed 17:45:24 all the other automation being in place already pretty much 17:46:14 ya maybe 17:46:30 I tried testing it out a week ago and couldn't get it to run 17:46:37 ah 17:47:13 iirc, either a problem with vagrant or a python version issue depending on what I tried 17:48:25 I gotta brush up on perl and start using it instead of python 17:49:42 pip sucks so much and they don't maintain compatibility between minor versions. I asked #debian what would be the easiest way to run some python code written for an earlier minor version of python 3 and they said to use a container or vm 17:57:09 bleb: JA_A recommended pyenv to me in a similar situation 17:57:39 Yes, pyenv, and ignore any Python versions installed through the package manager.