Upload
sherry
View
20
Download
0
Embed Size (px)
DESCRIPTION
Rethinking the ESP Game. Stephen Robertson, Milan Vojnovic, Ingmar Weber* Microsoft Research & Yahoo! Research *This work was done while I was a visiting researcher at MSRC. The ESP Game – Live Demo. Show it live . (2min) Alternative version. The ESP Game - Summary. - PowerPoint PPT Presentation
Citation preview
Rethinking the ESP Game
Stephen Robertson, Milan Vojnovic, Ingmar Weber*
Microsoft Research & Yahoo! Research
*This work was done while I was a visiting researcher at MSRC.
- 2 -
The ESP Game – Live Demo
Show it live. (2min)
Alternative version.
- 3 -
The ESP Game - Summary
• Two players try to agree on a label to be added to an image
• No way to communicate
• Entered labels only revealed at end
• Known labels are “off-limits”
• ESP refers to “Extrasensory perception”
• Read the other person’s mind
- 4 -
The ESP Game - History
• Developed by Luis von Ahn and Laura Dabbish at CMU in 2004
• Goal: Improve image search
• Licensed by Google in 2006
• A prime example of harvesting human intelligence for difficult tasks
• Many variants (music, shapes, …)
- 5 -
The ESP Game – Strengths and Weaknesses
• Strengths
– Creative approach to a hard problem
– Fun to play
– Vast majority of labels are appropriate
– Difficult to spam
– Powerful idea: Reaching consensus with little or no communication
- 6 -
The ESP Game – Strengths and Weaknesses
• Weaknesses
– The ultimate object is ill-defined
– Finds mostly general labels
– Already millions of images for these
– “Lowest common denominator” problem
– Human time is used sub-optimally
- 7 -
A “Robot” Playing the ESP Game
Video of recorded play.
- 8 -
The ESP Game – Labels are Predictable
• Synonyms are redundant
– “guy” => “man” for 81% of images
• Co-occurrence reduces “new” information
– “clouds” => “sky” for 68% of images
• Colors are easy to agree on
– “black” is 3.3% of all occurrences
- 9 -
How to Predict the Next Label
T = {“beach”, “water”}, next label t = ??
- 10 -
How to Predict the Next Label
Want to know:
P(“blue” next label | {“beach”, “water”})
P(“car” next label | {“beach”, “water”})
P(“sky” next label | {“beach”, “water”})
P(“bcn” next label | {“beach”, “water”})
Problem of data sparsity!
- 11 -
How to Predict the Next Label
Want to know:
P(“t” next label | T)
= P(T | “t” next label)¢P(“t”) / P(T)
Use conditional independence …
Give a random topic to two people.
Ask them to each think of 3 related terms.
P(A,B) = P(A|B)¢P(B) = P(B|A)¢P(A)
Bayes’ Theorem
- 12 -
Conditional Independence
Madridsunpaella
beachsoccerflamenco
“Spain”
skywatereyes
azulblaubleu
“blue”
P(A,B|C) = P(A|C)¢P(B|C)
P(“p1: sky”, “p2: azul” | “blue”) =P(“p1: sky” | “blue”) ¢ P(“p2: azul” | “blue”)
p1
p1
p2
p2
- 13 -
How to Predict the Next Label
P({s1, s2} | “t”) ¢ P(“t”) / P(T)
= P(s1 | “t”) ¢ P(s2 | “t”) ¢ P(“t”) / P(T)
P(s | “t”) will still be zero very often
! smoothing
P(s | “t”) = (1-¸) P(s | “t”) + ¸ P(s)
C.I. Assumption violated in practice, but “close enough”.
Non-zero background probability
- 14 -
How to Predict the Next Label
P(“t” next label | T already present)
= s2 T P(s | “t”) P(“t”) / Cwhere C is a normalizing constant
¸ chosen using a “validation set”. ¸ = 0.85 in the experiments.
Model trained on ~13,000 tag sets.
Also see: Naïve Bayes classifier
Cond. indep. assumption Bayes’ Theorem
- 15 -
Experimental Results: Part 1
Number of
- games played 205
- images encountered 1,335
- images w/ OLT 1,105
Percentage w/ match
- all images 69%
- only images with OLTs 81%
- all entered tags 17%
Av. number of labels entered
- per image 4.1
- per game 26.7
Agreement index
- mean 2.6
- median 2.0
The “robot” plays reasonably well.
The “robot” plays human-like.
- 16 -
Quantifying “Predictability” and “Information”
So, labels are fairly predictable.
But how can we quantify “predictability”?
- 17 -
Quantifying “Predictability” and “Information”
• “sunny” vs. “cloudy” tomorrow in BCN
• The role of a cubic dice
• The next single letter in “barcelo*”
• The next single letter in “re*”
• Clicked search result for “yahoo research”
- 18 -
Entropy and Information
• An event occurring with probability p corresponds to an information of -log2(p) bits ...
… number of bits required to encode in optimally compressed encoding
• Example: Compressed weather forecast:
P(“sunny”) = 0.5 0 (1 bit)
P(“cloudy”) = 0.25 10 (2 bits)
P(“rain”) = 0.125 110 (3 bits)
P(“thunderstorm”) = 0.125 111 (3 bits)
- 19 -
Entropy and Information
• p=1 ! 0 bits of information
– Cubic dice showed a number in [1,6]
• p¼0 ! many, many bits of information
– The numbers for the lottery
“information” = “amount of surprise”
- 20 -
Entropy and Information
• Expected information for p1, p2, …, pn:
i -pi ¢ log(pi) = (Shannon) entropy
• Might not know true p1, p2, …, pn, but think they are p1, p2, …, pn. Then, w.r.t. p you observe
i -pi ¢ log(pi) minimized for p = p
p given by earlier model. p is then observed.
- 21 -
Experimental Results: Part 2
Av. information per position of label in tag set
1 2 3 4 59.2 8.5 8.0 7.7 7.5
Later labels are more predictable.
Equidistribution = 12.3 bits. “Static” distribution = 9.3 bits.
Av. information per position of human suggestions
1 2 3 4 5+8.7 9.4 10.0 10.6 11.7
Human thinks harder and harder.
- 22 -
Improving the ESP Game
• Could score points according to –log2(p)- Number of bits of information added to the system
• Have an activation time limit for “obvious” labels- Remove the immediate satisfaction for simple matches
• Hide off-limits terms- Have to be more careful to avoid “obvious” labels
• Try to match “experts” - Use previous tags or meta information• Educate players
- Use previously labeled images to unlearn behavior• Automatically expand the off-limits list
- Easy, but 10+ terms not practical