Rethinking the ESP Game

Rethinking the ESP Game

Stephen Robertson, Milan Vojnovic, Ingmar Weber*

Microsoft Research & Yahoo! Research

*This work was done while I was a visiting researcher at MSRC.

- 2 -

The ESP Game – Live Demo

Show it live. (2min)

Alternative version.

- 3 -

The ESP Game - Summary

• Two players try to agree on a label to be added to an image

• No way to communicate

• Entered labels only revealed at end

• Known labels are “off-limits”

• ESP refers to “Extrasensory perception”

• Read the other person’s mind

- 4 -

The ESP Game - History

• Developed by Luis von Ahn and Laura Dabbish at CMU in 2004

• Goal: Improve image search

• Licensed by Google in 2006

• A prime example of harvesting human intelligence for difficult tasks

• Many variants (music, shapes, …)

- 5 -

The ESP Game – Strengths and Weaknesses

• Strengths

– Creative approach to a hard problem

– Fun to play

– Vast majority of labels are appropriate

– Difficult to spam

– Powerful idea: Reaching consensus with little or no communication

- 6 -

The ESP Game – Strengths and Weaknesses

• Weaknesses

– The ultimate object is ill-defined

– Finds mostly general labels

– Already millions of images for these

– “Lowest common denominator” problem

– Human time is used sub-optimally

- 7 -

A “Robot” Playing the ESP Game

Video of recorded play.

- 8 -

The ESP Game – Labels are Predictable

• Synonyms are redundant

– “guy” => “man” for 81% of images

• Co-occurrence reduces “new” information

– “clouds” => “sky” for 68% of images

• Colors are easy to agree on

– “black” is 3.3% of all occurrences

- 9 -

How to Predict the Next Label

T = {“beach”, “water”}, next label t = ??

- 10 -


Want to know:

P(“blue” next label | {“beach”, “water”})

P(“car” next label | {“beach”, “water”})

P(“sky” next label | {“beach”, “water”})

P(“bcn” next label | {“beach”, “water”})

Problem of data sparsity!

- 11 -


Want to know:

P(“t” next label | T)

= P(T | “t” next label)¢P(“t”) / P(T)

Use conditional independence …

Give a random topic to two people.

Ask them to each think of 3 related terms.

P(A,B) = P(A|B)¢P(B) = P(B|A)¢P(A)

Bayes’ Theorem

- 12 -

Conditional Independence

Madridsunpaella

beachsoccerflamenco

“Spain”

skywatereyes

azulblaubleu

“blue”

P(A,B|C) = P(A|C)¢P(B|C)

P(“p1: sky”, “p2: azul” | “blue”) =P(“p1: sky” | “blue”) ¢ P(“p2: azul” | “blue”)

p1

p1

p2

p2

- 13 -


P({s1, s2} | “t”) ¢ P(“t”) / P(T)

= P(s1 | “t”) ¢ P(s2 | “t”) ¢ P(“t”) / P(T)

P(s | “t”) will still be zero very often

! smoothing

P(s | “t”) = (1-¸) P(s | “t”) + ¸ P(s)

C.I. Assumption violated in practice, but “close enough”.

Non-zero background probability

- 14 -


P(“t” next label | T already present)

= s2 T P(s | “t”) P(“t”) / Cwhere C is a normalizing constant

¸ chosen using a “validation set”. ¸ = 0.85 in the experiments.

Model trained on ~13,000 tag sets.

Also see: Naïve Bayes classifier

Cond. indep. assumption Bayes’ Theorem

- 15 -

Experimental Results: Part 1

Number of

- games played 205

- images encountered 1,335

- images w/ OLT 1,105

Percentage w/ match

- all images 69%

- only images with OLTs 81%

- all entered tags 17%

Av. number of labels entered

- per image 4.1

- per game 26.7

Agreement index

- mean 2.6

- median 2.0

The “robot” plays reasonably well.

The “robot” plays human-like.

- 16 -

Quantifying “Predictability” and “Information”

So, labels are fairly predictable.

But how can we quantify “predictability”?

- 17 -

Quantifying “Predictability” and “Information”

• “sunny” vs. “cloudy” tomorrow in BCN

• The role of a cubic dice

• The next single letter in “barcelo*”

• The next single letter in “re*”

• Clicked search result for “yahoo research”

- 18 -

Entropy and Information

• An event occurring with probability p corresponds to an information of -log2(p) bits ...

… number of bits required to encode in optimally compressed encoding

• Example: Compressed weather forecast:

P(“sunny”) = 0.5 0 (1 bit)

P(“cloudy”) = 0.25 10 (2 bits)

P(“rain”) = 0.125 110 (3 bits)

P(“thunderstorm”) = 0.125 111 (3 bits)

- 19 -


• p=1 ! 0 bits of information

– Cubic dice showed a number in [1,6]

• p¼0 ! many, many bits of information

– The numbers for the lottery

“information” = “amount of surprise”

- 20 -


• Expected information for p1, p2, …, pn:

i -pi ¢ log(pi) = (Shannon) entropy

• Might not know true p1, p2, …, pn, but think they are p1, p2, …, pn. Then, w.r.t. p you observe

i -pi ¢ log(pi) minimized for p = p

p given by earlier model. p is then observed.

- 21 -

Experimental Results: Part 2

Av. information per position of label in tag set

1 2 3 4 59.2 8.5 8.0 7.7 7.5

Later labels are more predictable.

Equidistribution = 12.3 bits. “Static” distribution = 9.3 bits.

Av. information per position of human suggestions

1 2 3 4 5+8.7 9.4 10.0 10.6 11.7

Human thinks harder and harder.

- 22 -

Improving the ESP Game

• Could score points according to –log2(p)- Number of bits of information added to the system

• Have an activation time limit for “obvious” labels- Remove the immediate satisfaction for simple matches

• Hide off-limits terms- Have to be more careful to avoid “obvious” labels

• Try to match “experts” - Use previous tags or meta information• Educate players

- Use previously labeled images to unlearn behavior• Automatically expand the off-limits list

- Easy, but 10+ terms not practical

- 23 -

Questions

Thank you!

[email protected]

Documents

Rethinking the ESP Game