-
OrIdow6Do we have any way to enumerate the images?
-
OrIdow6Another thing where something that stores all outlinks seen from #// might help
-
OrIdow6But aside from that I don't see a way?
-
OrIdow6Could also beg them to remove the WBM block for posterity's sake I guess, if it seems like they're ideologically compatible
-
arkiverabload.de also never responded to my email
-
wickerzmarathonoil.com <— acquired by Conocophillips. Might be something for AB
-
skibidirizzlererm, what the sigma?
-
ThreeHMMaybe another bot that allows people to submit URL lists like we have for imgur?
-
nicolas17ThreeHM: are you talking about abload?
-
nicolas17OrIdow6: abload image IDs are much longer than imgur so I doubt we can enumerate or bruteforce
-
nicolas17ah no it's worse
-
nicolas17images are identified by the user-provided filename + a random suffix
-
nicolas17so IDs are arbitrarily long
-
fireonliveah damn it
-
ThreeHMnicolas17: Yeah, seems like the best way if we can't enumerate IDs
-
fireonlivegotta warm up datechnoman's scrapers i suppose if we project it
-
nicolas17yeah gotta hunt for URLs in existing website archives
-
NicoHello!
-
NicoI found out about archive team recently, and I'm really scratching my head on how you insure data integrity.
-
Notrealname1234Hey
-
NicoDo you wait until multiple people have scraped a page? Do you just trust everyone?
-
Notrealname1234I don't really know about this
-
pokechu22My understanding is that it's mostly a case of trusting everyone, along with some checks that the script used to download stuff hasn't been modified (which aren't perfect and are mainly there to handle people modifying the script in good faith based on my understanding)
-
nicolas17and "don't give people ideas on how to mess up the data"
-
NicoUnderstood. I'm worried that if the project gets bigger, bad actors could pollute or change the truth fairly easily.
-
pokechu22One thing that *isn't* currently done is also saving the tls-level data, though there's some work on that (it requires updating the WARC spec). Though IIRC saving that isn't sufficient for proving the data hasn't been modified after the fact (something about symmetric encryption for data in transit?) though I don't know the details and might be misremembering
-
nicolas17yes, the way it works you have the needed keys to forge/modify a TLS packet dump
-
nicolas17but it would be a *lot* of work compared to modifying a warc
-
fireonlivehopefully sxg takes off?
-
fireonlive