04:41:28 Do we have any way to enumerate the images? 05:25:26 Another thing where something that stores all outlinks seen from #// might help 05:25:56 But aside from that I don't see a way? 05:27:23 Could also beg them to remove the WBM block for posterity's sake I guess, if it seems like they're ideologically compatible 05:47:25 abload.de also never responded to my email 06:42:10 https://www.marathonoil.com/ <— acquired by Conocophillips. Might be something for AB 11:19:32 erm, what the sigma? 11:38:23 Maybe another bot that allows people to submit URL lists like we have for imgur? 18:24:35 ThreeHM: are you talking about abload? 18:26:18 OrIdow6: abload image IDs are much longer than imgur so I doubt we can enumerate or bruteforce 18:29:40 ah no it's worse 18:29:50 images are identified by the user-provided filename + a random suffix 18:29:55 so IDs are arbitrarily long 18:41:18 ah damn it 19:23:05 nicolas17: Yeah, seems like the best way if we can't enumerate IDs 19:23:50 gotta warm up datechnoman's scrapers i suppose if we project it 19:24:39 yeah gotta hunt for URLs in existing website archives 22:23:53 Hello! 22:24:37 I found out about archive team recently, and I'm really scratching my head on how you insure data integrity. 22:24:50 Hey 22:25:06 Do you wait until multiple people have scraped a page? Do you just trust everyone? 22:25:32 I don't really know about this 22:29:34 My understanding is that it's mostly a case of trusting everyone, along with some checks that the script used to download stuff hasn't been modified (which aren't perfect and are mainly there to handle people modifying the script in good faith based on my understanding) 22:32:13 and "don't give people ideas on how to mess up the data" 22:33:50 Understood. I'm worried that if the project gets bigger, bad actors could pollute or change the truth fairly easily. 22:34:29 One thing that *isn't* currently done is also saving the tls-level data, though there's some work on that (it requires updating the WARC spec). Though IIRC saving that isn't sufficient for proving the data hasn't been modified after the fact (something about symmetric encryption for data in transit?) though I don't know the details and might be misremembering 22:39:46 yes, the way it works you have the needed keys to forge/modify a TLS packet dump 22:40:05 but it would be a *lot* of work compared to modifying a warc 22:40:51 hopefully sxg takes off? 22:41:56 https://web.dev/articles/signed-exchanges https://wicg.github.io/webpackage/draft-yasskin-http-origin-signed-responses.html