-
purplebot
Google Poly edited by Ajay (+1216, Add info about retrieving data …) just now --
archiveteam.org/?diff=46522&oldid=46512
-
atphoenix
OrIdow6, I know that sites have been put back online by original, sympathetic webmasters in cooperation with AT in a way that *only* AT could still access the site in order to complete an archiving effort. The opposite has also happened, like with Yahoo and Nintendo
-
cadence
@jodizzle: Hi, I'm one of the people who spearheaded the youtube annotations archive project, and while the chat logs are lost to time, I believe the trust system worked very well for us.
-
cadence
@AK: in the annotations archive, when a worker submitted correct data its trust would increase, and higher trust means less (but never zero) chance to redo existing work for verification.
-
AK
Ooh that's clever
-
cadence
though here we have the problem that the annotation data is always the same when requested multiple times - it's an XML - but Y!A's page almost certainly will change if people submit new answers to the question, or even if there's a "recommended" section that's somewhat randomly generated.
-
AK
I think that's gonna be the issue, static vs potentially dynamic content
-
cadence
the calculation was probably something like...
-
AK
For an api (or annotations), the trust works well
-
cadence
1/($trust+1) chance to redo existing work
-
cadence
if work is correct, $trust++
-
cadence
if work is incorrect, $trust-=10
-
cadence
negative trust means you can't do work anymore, kill the access token
-
cadence
and then only allow a certain number of access tokens to be generated per IP address
-
cadence
that's how we did it. though this was mainly to stop faulty workers rather than maliciously edited data.
-
cadence
might want to do something like $trust = min($trust/2, $trust-10) so that it drops off sharply even when high, to prevent somebody jacking up their trust ridiculously high before submitting spam
-
cadence
/shrug
-
cadence
well, the issue isn't that the submissions need to be the _same,_ they just need to be _equivalent_
-
cadence
if you can afford some kind of server-side validation that checks the basic page structure, and extracts the main question body for comparison... that might work?
-
cadence
a much simpler thing to do would be - assuming you already have a list of questions to scrape - hand them all out once, and after you reach the end, hand them all out again. only if there is a difference between the 1st and 2nd attempt of a question do you send it out a 3rd time. so if anybody tries to alter a question, they'd have to get really lucky and alter it twice somehow to not be caught.
-
cadence
this one assumes that we have enough time to scrape all questions twice, but it will at least try to go through all the questions a single time first.
-
cadence
food for thought!
-
Kaz
cadence: how do you authenticate workers
-
cadence
on first run, before they can do any work, they ask the central server for an access token.
-
cadence
if the requesting IP has not already generated too many tokens, the server will respond with a new access token with trust 0 that can be used
-
Kaz
ah ok, so it's actually per-instance rather than like, 'per human operator' or anything like that
-
cadence
yah
-
cadence
there's no real way to verify humans
-
cadence
(fuck you stripe)
-
AK
I suppose another option would be per human operator
-
AK
Then we could remember peoples trust levels for future projects
-
AK
(As in a token per operator that they use on all their instances)
-
flashfire42
-
cadence
claims to be pro-life; dies anyway
-
Wayward
:o
-
OrIdow6
atphoenix: Could work here, don't know what the specifics of Zopolis's (who has gone offline) situation are
-
purplebot
Google Poly edited by Ajay (+312) just now --
archiveteam.org/?diff=46523&oldid=46522
-
jodizzle
cadence: Thanks for this. I didn't realize that the point of the trust system was to stop faulty workers, rather than malicious actors, but that makes sense.
-
atphoenix
cadence: what is the reference to stripe? was that a human who tried to make trouble?
-
tech234a
perhaps Stripe the payment processor?
-
billy549
when is archiveteam warrior shirts? as cool as the stickers are, iirc it's not handled by a main AT person (though by them being linked on the wiki i trust them ;p)
-
billy549
would be cool to do a shirt ;)
-
hook54321
billy549: i'm not sure but i think the person selling the stickers might be the person that designed it
-
billy549
ahh oki
-
arkiver
chfoo: you know anything on that? ^
-
chfoo
i don't remember what license things were uploaded to the wiki, but it should have been one of the licenses that allows sharing and commercial use
-
hook54321
no license
-
billy549
its a very pretty logo
-
billy549
but no, i dont think ajh is the original designer
-
chfoo
oh, i think the logos were commissioned and licensing might be implied for reuse, but it was done before my time here
-
hook54321
I think I found the artist, I can ask. Either way I'm not sure a random person's redbubble store should be advertised on the wiki though.
-
billy549
yeah