Movie vs movie

Preview:

DESCRIPTION

What are your top ten favorite movies of all time? This is a very difficult question. But why? Irmak Sirer explains the challenges of measuring how much we like movies, books, songs, or products; combining insights from diverse sources like the Netflix Prize, Duncan Watts' social experiments, or the beginnings of Facebook. The better we get at measuring and ranking levels of enjoyment, the better we can customize websites, sort search results, find other people with similar tastes, and recommend products, so can we overcome these challenges? Drumroll... Yes, we can.

Citation preview

Irmak Sirer@frrmack

movievsmovie.datasco.pe

How muchdo we likethings?

AGE 7

Oh cool.

Pretty good. Space and stuff.

AGE 14

Omigod Omigod Omigod.

Epic masterpiece is epic!!!!1!I'm in love with Leia.

AGE 17

WTF?

AGE 30

When you think about it, it's not that good.

AGE 30

When you think about it, it's not that good.

Ah, who am I kidding? It's amazing.I'm still in love with Leia.

I mean... look at her.

What determineshow much I like a movie?

What determineshow much I like a movie?

Is my reaction to amovie / book / song

predictable?

How much will I likeThe Book of Eli?

2006

Cinematch

1 billion user ratings

55,000movies

Cinematch

I have a soulmate in taste

Irmak

Cinematch

I have a soulmate in taste

Irmak Frrmack

Cinematch

I have a soulmate in taste

Watched the same movies

Irmak Frrmack

Cinematch

I have a soulmate in taste

Watched the same moviesGave the exact same ratings

Irmak Frrmack

Cinematch

I have a soulmate in taste

Watched the same moviesGave the exact same ratings

Except The Book of Eli

Irmak Frrmack

Cinematch

I have a soulmate in taste

Frrmack watched The Book of Eli

Irmak Frrmack

Cinematch

I have a soulmate in taste

Irmak Frrmack

Oh man, it was…

Cinematch

I have a soulmate in taste

Irmak Frrmack

Oh man, it was…FANTASTIC!

Cinematch

I have a soulmate in taste

Irmak Frrmack

Oh man, it was…FANTASTIC!

Predict

No perfect soulmates in real life

Irmak

Irmak

Almost soulmate 1

No perfect soulmates in real life

Irmak

Almost soulmate 1 Almost soulmate 2

No perfect soulmates in real life

Irmak

Almost soulmate 1 Almost soulmate 2

Almost soulmate 3

No perfect soulmates in real life

Irmak

Almost soulmate 1 Almost soulmate 2

Almost soulmate 4Almost soulmate 3

No perfect soulmates in real life

Irmak

87% soulmate 74% soulmate

95% soulmate82% soulmate

No perfect soulmates in real life

Irmak

No perfect soulmates in real life

Irmak

No perfect soulmates in real life

CinematchWorks well for movies that everybody rates

Cinematch Quite bad with movies that only few people rate

Cinematch

Some movies are especially difficult to predict

Biggest error source: popular but weird

15% of all errors from ONE movie

Trivial: Mean score of everyone

Trivial: Mean score of everyoneError: (RMSE) 1.0540 stars

Trivial: Mean score of everyoneError: (RMSE) 1.0540 stars

CinematchError: (RMSE) 0.9525 stars

Trivial: Mean score of everyoneError: (RMSE) 1.0540 stars

CinematchError: (RMSE) 0.9525 stars

9.6%

Trivial: Mean score of everyoneError: (RMSE) 1.0540 stars

CinematchError: (RMSE) 0.9525 stars

Better rankings Better recommendations

9.6%

Trivial: Mean score of everyoneError: (RMSE) 1.0540 stars

CinematchError: (RMSE) 0.9525 stars

Better rankings Better recommendations

+ 8.6% + 1200% people watch top recommendation

9.6%

BigChaos Netflix Prize Report

CinematchError: 0.9525 stars

CinematchError: 0.9525 stars

$1,000,000for a 10% improvement

2006

CinematchError: 0.9525 stars

Bring it down to:Error: 0.8563 stars

$1,000,000for a 10% improvement

2006

BellKor’s Pragmatic Chaos

How did they do it?

How did they do it?

How did they do it?

Before:Solid assumptions

You have a certain taste.

Your taste dictates a hidden rating for Book of Eli.

When you watch it, this rating is revealed to you.

How did they do it?

Before:Solid assumptions

You have a certain taste.

Your taste dictates a hidden rating for Book of Eli.

When you watch it, this rating is revealed to you.WRON

G

How did they do it?

After:

Your rating changes with time.

How did they do it?

After:

Your rating changes with time.

It depends on...

How did they do it?

After:

Your rating changes with time.

It depends on...

how many you rated that day

your average rating for the day

which movies you rated on this day

shown Netflix prediction

Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009

Trivial: Mean score of everyoneError: 1.0540 stars

CinematchError: 0.9525 stars

Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009

Trivial: Mean score of everyoneError: 1.0540 stars

CinematchError: 0.9525 stars

Your time dependent rating tendencies

Trivial: Mean score of everyoneError: 1.0540 stars

CinematchError: 0.9525 stars

Your time dependent rating tendenciesError: 0.9278 stars

Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009

Trivial: Mean score of everyoneError: 1.0540 stars

CinematchError: 0.9525 stars

Your time dependent rating tendenciesError: 0.9278 stars

Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009

12.0%

Trivial: Mean score of everyoneError: 1.0540 stars

CinematchError: 0.9525 stars

Your time dependent rating tendenciesError: 0.9278 stars

without looking at which movies you like/hate!

Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009

12.0%

What does this suggest?

What does this suggest?

We cannot compare a movie with all others we've seen.

What does this suggest?

We cannot compare a movie with all others we've seen.

We compare it to a limited set.

What does this suggest?

We cannot compare a movie with all others we've seen.

We compare it to a limited set.

Liking (real time & remembered) depends on time and mood.

What does this suggest?

We cannot compare a movie with all others we've seen.

We compare it to a limited set.

Liking (real time & remembered) depends on time and mood.

Other people's opinions affect our own (followers / hipsters)

What does this suggest?

We cannot compare Book of Eli with all movies we've seen.

We compare it to a limited set.

Liking (real time & remembered) depends on time and mood.

Other people's opinions affect our own (followers / hipsters)

An experiment

Music Lab: A website for downloading music

An experiment

Same website: Music download and rating

M.J. Salganik, P.S. Dodds, D.J. Watts. Science, 311:854-856, 2006

An experiment

Music Lab: A website for downloading music

Alternative A:Other people's ratings invisible

An experiment

Music Lab: A website for downloading music

Alternative A:Other people's ratings invisible

More or less equal ratings

An experiment

Music Lab: A website for downloading music

Alternative A:Other people's ratings invisible

Alternative B:All ratings visible

More or less equal ratings

An experiment

Music Lab: A website for downloading music

Alternative A:Other people's ratings invisible

Alternative B:All ratings visible

More or less equal ratings

Several songs snowball in popularity

An experiment

Music Lab: A website for downloading music

Alternative A:Other people's ratings invisible

Alternative B:All ratings visible

More or less equal ratings

Several songs snowball in popularity

It's different songs for each trial

Social influence plays a big part in determining hits and misses

Problems with rating movies

We cannot compare a movie with all others we've seen.

We compare it to a limited set.

Liking (real time & remembered) depends on time and mood.

Other people's opinions affect our own.

Degree of liking issensitive and vague

Amazing! Total garbage

Tuesday 3am Sunday 12pm

Liking (real time & remembered) depends on time and mood.

Other people's opinions affect our own.

Degree of liking issensitive and vague

Degree of liking issensitive and vague

Dependent on many otherenvironmental factors

besides our taste

We cannot compare a movie with all others we've seen.

We compare it to a limited set.

Degree of liking issensitive and vague

Degree of liking issensitive and vague

Difficult to describeaccurately and consistently

with a number

Predicting aside,

can I even reliably rate & rank movies I’ve seen in terms of enjoyment?

Irmak Frrmack

What are your top twenty

movies?

Irmak Frrmack

Well…Ummm…

What are your top twenty

movies?

Irmak Frrmack

Well…Ummm…I like Star Wars.

What are your top twenty

movies?

Degree of liking issensitive and vague

Can’t we dosomething

about this?

Degree of liking issensitive and vague

“Enjoyment” from a movie is very high dimensional information

“Enjoyment” from a movie is very high dimensional information

Rating means projecting this onto a single dimension

?

But sometimes you just want to do the best projection you can

What is my top twenty?

We cannot compare a movie with all others we've seen.

We compare it to a limited set.

Degree of liking issensitive and vague

Trying to rate Star Wars

Trying to rate Star Wars

Trying to rate Star Wars

Map enjoymentto a specific scale

1

Trying to rate Star Wars

Map enjoymentto a specific scale

1

Trying to rate Star Wars

Map enjoymentto a specific scale

1

Trying to rate Star Wars

choose corresponding rating

for this degree of liking

2

Trying to rate Star Wars

But we cannot keepthis entire history ofenjoyment in mind

Trying to rate Star Wars

But we cannot keepthis entire history ofenjoyment in mind

We fuzzily remembera small subset

Trying to rate Star Wars

But we cannot keepthis entire history ofenjoyment in mind

We fuzzily remembera small subset

We map based on this subset

Trying to rate Star Wars

But we cannot keepthis entire history ofenjoyment in mind

We fuzzily remembera small subset

We map based on this subset

SAMPLIN

G

BIASEDSAMPLIN

G

Tuesday

Tuesday

Friday

Friday

Degree of liking issensitive and vague

Can’t we dosomething

about this?

We can certainly handlesingle comparisons

?

We can certainly handlesingle comparisons

We can certainly handlesingle comparisons

less vague

We can certainly handlesingle comparisons

little information

I can manually compare it with all others

And find exactly where it belongs

right after Indiana Jones

right before The Princess

Bride

Full ranking: Compare all pairs

That’s a bittoo much effortfor me

1,000,000 comparisons?

We don’t need all of them

We don’t need all of them

If

We don’t need all of them

If

,

We don’t need all of them

If

,

I have some information about

Compare a random sample of pairs

Use a ranking algorithm that utilizesall the information

Good idea!

Elo rating system

Elo rating system

Elo rating system

Elo rating system

7.00

“hotness”

Elo rating system

7.00

“hotness” range

+1.50-1.50

Elo rating system

7.00 8.00+1.50-1.50 +1.50-1.50

Elo rating system

7.00 8.00+1.50-1.50 +1.50-1.50

7.12 7.68

Elo rating system

7.00 8.00

7.12 7.68

+1.50-1.50 +1.50-1.50

Elo rating system

7.00 8.00

7.12 7.68

+1.50-1.50 +1.50-1.50

Elo rating system

7.00 8.00+150-150 +150-150

36%to win

64%to win

Elo rating system

How do we find out what these ranges are?

Elo rating system

Start with the same guess for every contender

5.00 5.00 5.00 5.00 5.00 5.00

Elo rating system

5.00 5.00

?

Elo rating system

5.00 5.00

Elo rating system

5.12 4.88

Update the best guesses accordingly

Elo rating system

5.12 5.00

?

Elo rating system

5.24 4.88

Elo rating system

5.24 5.00

?

Elo rating system

5.14 5.10

We don’t need all comparisons

If

,

I have some information about

Elo rating system

7.61 4.02

?

Elo rating system

7.61 4.02

?

89%to win

11%to win

Elo rating system

7.61

+.024.02

-.02

89%to win

11%to win

Elo rating system

7.61

-.534.02

+.53

89%to win

11%to win

Elo rating system

We now have scores on a single scale

9.07 8.42 6.40 4.88 4.20 3.03

Elo rating system

We now have scores on a single scale(estimates of people’s appreciation levels)

9.07 8.42 6.40 4.88 4.20 3.03

Elo rating system

and a ranking

1 2 3 4 5 6

9.07 8.42 6.40 4.88 4.20 3.03

Degree of liking issensitive and vague

Can we somehow applythis to movies, then?

We can do better

We can do betterBayesian ranking algorithms

We can do betterBayesian ranking algorithms

Glicko(The Elo Killer)

1999

We can do betterBayesian ranking algorithms

Glicko(The Elo Killer)

1999

TrueSkill™

2007

Bayesian ranking

4.46 4.01

+- +-

Liking (real time & remembered) depends on time and mood.

Other people's opinions affect our own.

Degree of liking issensitive and vague

Bayesian ranking

4.46 4.01

+- +-

Bayesian ranking

4.46 4.01

+- +-

82%to win

15%to win

3%to draw

Bayesian ranking

?

Bayesian ranking

? 4.3

Elo:Best guess

for the center

Bayesian ranking

? 4.3

Bayesian:It could be

centered around

Bayesian:It could also be

centered around

Bayesian ranking

? 4.2

Bayesian:or

centered around

Bayesian ranking

? 4.4

Bayesian:Less likely

but even around

Bayesian ranking

? 4.5

Bayesian ranking

? 4.3

3.5 4 4.5 5

Pro

babi

lity

Bayesian ranking

? 4.3

3.5 4 4.5 5

Pro

babi

lity

uncertainty

Few comparisons: Lots of uncertainty(anything from 2.3 to 4.5 is quite possible)

2.0 2.5 3.0 3.5 4 4.5 5

Pro

babi

lity

After many comparisons: Quite sure(pretty much between 4.11 to 4.18)

Pro

babi

lity

2.0 2.5 3.0 3.5 4 4.5 5

Bayesian ranking

?

Bayesian ranking

Star Wars

Lord ofthe Rings

2.0 3.0 4.0 5.0

Bayesian ranking

Star Wars

Lord ofthe Rings

2.0 3.0 4.0 5.0

How did they do it?

After:

Your rating changes with time.

A small, constant increasein uncertainty before eachcomparison

3.5 4 4.5 5

Pro

babi

lity

uncertainty

Degree of liking issensitive and vague

Great! We have a system!

I don’t want to spend too much time on this

How many is too many?

Minimum EffortMaximum Information

Minimum EffortMaximum Information

1 3 5 1 3 5 1 3 5 1 3 5 1 3 5

Minimum EffortMaximum Information

Minimum EffortMaximum Information

Minimum EffortMaximum Information

Not reliable by itselfStill carries a lot of information

Minimum EffortMaximum Information

1 3 5

Minimum EffortMaximum Information

1 3 5 1 3 5

I don’t want to spend too much time on this

What else can we do?

Minimum EffortMaximum Information

?

Minimum EffortMaximum Information

?

98%to win

1%to win

1%to draw

Minimum EffortMaximum Information

?

98%to win

Did not learn anything new

Minimum EffortMaximum Information

?

Quite a bit of new information

2%to win

Minimum EffortMaximum Information

?

I can calculate the expected amount of information from a comparison!

Minimum EffortMaximum Information

Minimum EffortMaximum Information

Certain about both moviesWon’t learn a lot

Minimum EffortMaximum Information

Certain about both moviesWon’t learn a lot

Minimum EffortMaximum Information

Certain about both moviesWon’t learn a lot

Don’t know much about eitherWill learn a lot

regardless of outcome

Irmak Frrmack

What are your top twenty

movies?

movievsmovie.datasco.pe

Quantifying human reactions are hard

books

songs

food

politicans

products

celebrities

tv shows

importance of issues

what to spend ‘fun’ budget on

teams in different sports

Degree of liking issensitive and vague

Amazing! Total garbage

Tuesday 3am Sunday 12pm

Quantifying reactions is very useful

customized websites

sorting search results

recommendations

connecting with other people of similar tastes

identifying meaningful groups ofsimilar products / people

understanding your own preferences

Quantifying reactions is very useful

Quantifying human reactions are hard

Start with a rating,pose the correct comparisons

Quantifying human reactions are hard

Start with a rating,pose the correct comparisons

Every decision gets us closer

Degree of liking issensitive and vague

Amazing! Total garbage

Tuesday 3am Sunday 12pm

Many comparisons for a movie

over different days

averages out mood and other factors

Many comparisons for a movie

over different days

averages out mood and other factorsWe can’t do much about social influence,

but we should just accept thatas natural part of how much we like things

Degree of liking issensitive and vague

Amazing! Total garbage

Tuesday 3am Sunday 12pm

A great way of collecting desired data

is to make it fun

movievsmovie.datasco.pe

Thanks

Recommended