Upload
irmak-sirer
View
868
Download
6
Tags:
Embed Size (px)
DESCRIPTION
What are your top ten favorite movies of all time? This is a very difficult question. But why? Irmak Sirer explains the challenges of measuring how much we like movies, books, songs, or products; combining insights from diverse sources like the Netflix Prize, Duncan Watts' social experiments, or the beginnings of Facebook. The better we get at measuring and ranking levels of enjoyment, the better we can customize websites, sort search results, find other people with similar tastes, and recommend products, so can we overcome these challenges? Drumroll... Yes, we can.
Citation preview
Irmak Sirer@frrmack
movievsmovie.datasco.pe
How muchdo we likethings?
AGE 7
Oh cool.
Pretty good. Space and stuff.
AGE 14
Omigod Omigod Omigod.
Epic masterpiece is epic!!!!1!I'm in love with Leia.
AGE 17
WTF?
AGE 30
When you think about it, it's not that good.
AGE 30
When you think about it, it's not that good.
Ah, who am I kidding? It's amazing.I'm still in love with Leia.
I mean... look at her.
What determineshow much I like a movie?
What determineshow much I like a movie?
Is my reaction to amovie / book / song
predictable?
How much will I likeThe Book of Eli?
2006
Cinematch
1 billion user ratings
55,000movies
Cinematch
I have a soulmate in taste
Irmak
Cinematch
I have a soulmate in taste
Irmak Frrmack
Cinematch
I have a soulmate in taste
Watched the same movies
Irmak Frrmack
Cinematch
I have a soulmate in taste
Watched the same moviesGave the exact same ratings
Irmak Frrmack
Cinematch
I have a soulmate in taste
Watched the same moviesGave the exact same ratings
Except The Book of Eli
Irmak Frrmack
Cinematch
I have a soulmate in taste
Frrmack watched The Book of Eli
Irmak Frrmack
Cinematch
I have a soulmate in taste
Irmak Frrmack
Oh man, it was…
Cinematch
I have a soulmate in taste
Irmak Frrmack
Oh man, it was…FANTASTIC!
Cinematch
I have a soulmate in taste
Irmak Frrmack
Oh man, it was…FANTASTIC!
Predict
No perfect soulmates in real life
Irmak
Irmak
Almost soulmate 1
No perfect soulmates in real life
Irmak
Almost soulmate 1 Almost soulmate 2
No perfect soulmates in real life
Irmak
Almost soulmate 1 Almost soulmate 2
Almost soulmate 3
No perfect soulmates in real life
Irmak
Almost soulmate 1 Almost soulmate 2
Almost soulmate 4Almost soulmate 3
No perfect soulmates in real life
Irmak
87% soulmate 74% soulmate
95% soulmate82% soulmate
No perfect soulmates in real life
Irmak
No perfect soulmates in real life
Irmak
No perfect soulmates in real life
CinematchWorks well for movies that everybody rates
Cinematch Quite bad with movies that only few people rate
Cinematch
Some movies are especially difficult to predict
Biggest error source: popular but weird
15% of all errors from ONE movie
Trivial: Mean score of everyone
Trivial: Mean score of everyoneError: (RMSE) 1.0540 stars
Trivial: Mean score of everyoneError: (RMSE) 1.0540 stars
CinematchError: (RMSE) 0.9525 stars
Trivial: Mean score of everyoneError: (RMSE) 1.0540 stars
CinematchError: (RMSE) 0.9525 stars
9.6%
Trivial: Mean score of everyoneError: (RMSE) 1.0540 stars
CinematchError: (RMSE) 0.9525 stars
Better rankings Better recommendations
9.6%
Trivial: Mean score of everyoneError: (RMSE) 1.0540 stars
CinematchError: (RMSE) 0.9525 stars
Better rankings Better recommendations
+ 8.6% + 1200% people watch top recommendation
9.6%
BigChaos Netflix Prize Report
CinematchError: 0.9525 stars
CinematchError: 0.9525 stars
$1,000,000for a 10% improvement
2006
CinematchError: 0.9525 stars
Bring it down to:Error: 0.8563 stars
$1,000,000for a 10% improvement
2006
BellKor’s Pragmatic Chaos
How did they do it?
How did they do it?
How did they do it?
Before:Solid assumptions
You have a certain taste.
Your taste dictates a hidden rating for Book of Eli.
When you watch it, this rating is revealed to you.
How did they do it?
Before:Solid assumptions
You have a certain taste.
Your taste dictates a hidden rating for Book of Eli.
When you watch it, this rating is revealed to you.WRON
G
How did they do it?
After:
Your rating changes with time.
How did they do it?
After:
Your rating changes with time.
It depends on...
How did they do it?
After:
Your rating changes with time.
It depends on...
how many you rated that day
your average rating for the day
which movies you rated on this day
shown Netflix prediction
Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
Trivial: Mean score of everyoneError: 1.0540 stars
CinematchError: 0.9525 stars
Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
Trivial: Mean score of everyoneError: 1.0540 stars
CinematchError: 0.9525 stars
Your time dependent rating tendencies
Trivial: Mean score of everyoneError: 1.0540 stars
CinematchError: 0.9525 stars
Your time dependent rating tendenciesError: 0.9278 stars
Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
Trivial: Mean score of everyoneError: 1.0540 stars
CinematchError: 0.9525 stars
Your time dependent rating tendenciesError: 0.9278 stars
Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
12.0%
Trivial: Mean score of everyoneError: 1.0540 stars
CinematchError: 0.9525 stars
Your time dependent rating tendenciesError: 0.9278 stars
without looking at which movies you like/hate!
Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
12.0%
What does this suggest?
What does this suggest?
We cannot compare a movie with all others we've seen.
What does this suggest?
We cannot compare a movie with all others we've seen.
We compare it to a limited set.
What does this suggest?
We cannot compare a movie with all others we've seen.
We compare it to a limited set.
Liking (real time & remembered) depends on time and mood.
What does this suggest?
We cannot compare a movie with all others we've seen.
We compare it to a limited set.
Liking (real time & remembered) depends on time and mood.
Other people's opinions affect our own (followers / hipsters)
What does this suggest?
We cannot compare Book of Eli with all movies we've seen.
We compare it to a limited set.
Liking (real time & remembered) depends on time and mood.
Other people's opinions affect our own (followers / hipsters)
An experiment
Music Lab: A website for downloading music
An experiment
Same website: Music download and rating
M.J. Salganik, P.S. Dodds, D.J. Watts. Science, 311:854-856, 2006
An experiment
Music Lab: A website for downloading music
Alternative A:Other people's ratings invisible
An experiment
Music Lab: A website for downloading music
Alternative A:Other people's ratings invisible
More or less equal ratings
An experiment
Music Lab: A website for downloading music
Alternative A:Other people's ratings invisible
Alternative B:All ratings visible
More or less equal ratings
An experiment
Music Lab: A website for downloading music
Alternative A:Other people's ratings invisible
Alternative B:All ratings visible
More or less equal ratings
Several songs snowball in popularity
An experiment
Music Lab: A website for downloading music
Alternative A:Other people's ratings invisible
Alternative B:All ratings visible
More or less equal ratings
Several songs snowball in popularity
It's different songs for each trial
Social influence plays a big part in determining hits and misses
Problems with rating movies
We cannot compare a movie with all others we've seen.
We compare it to a limited set.
Liking (real time & remembered) depends on time and mood.
Other people's opinions affect our own.
Degree of liking issensitive and vague
Amazing! Total garbage
Tuesday 3am Sunday 12pm
Liking (real time & remembered) depends on time and mood.
Other people's opinions affect our own.
Degree of liking issensitive and vague
Degree of liking issensitive and vague
Dependent on many otherenvironmental factors
besides our taste
We cannot compare a movie with all others we've seen.
We compare it to a limited set.
Degree of liking issensitive and vague
Degree of liking issensitive and vague
Difficult to describeaccurately and consistently
with a number
Predicting aside,
can I even reliably rate & rank movies I’ve seen in terms of enjoyment?
Irmak Frrmack
What are your top twenty
movies?
Irmak Frrmack
Well…Ummm…
What are your top twenty
movies?
Irmak Frrmack
Well…Ummm…I like Star Wars.
What are your top twenty
movies?
Degree of liking issensitive and vague
Can’t we dosomething
about this?
Degree of liking issensitive and vague
“Enjoyment” from a movie is very high dimensional information
“Enjoyment” from a movie is very high dimensional information
Rating means projecting this onto a single dimension
?
But sometimes you just want to do the best projection you can
What is my top twenty?
We cannot compare a movie with all others we've seen.
We compare it to a limited set.
Degree of liking issensitive and vague
Trying to rate Star Wars
Trying to rate Star Wars
Trying to rate Star Wars
Map enjoymentto a specific scale
1
Trying to rate Star Wars
Map enjoymentto a specific scale
1
Trying to rate Star Wars
Map enjoymentto a specific scale
1
Trying to rate Star Wars
choose corresponding rating
for this degree of liking
2
Trying to rate Star Wars
But we cannot keepthis entire history ofenjoyment in mind
Trying to rate Star Wars
But we cannot keepthis entire history ofenjoyment in mind
We fuzzily remembera small subset
Trying to rate Star Wars
But we cannot keepthis entire history ofenjoyment in mind
We fuzzily remembera small subset
We map based on this subset
Trying to rate Star Wars
But we cannot keepthis entire history ofenjoyment in mind
We fuzzily remembera small subset
We map based on this subset
SAMPLIN
G
BIASEDSAMPLIN
G
Tuesday
Tuesday
Friday
Friday
Degree of liking issensitive and vague
Can’t we dosomething
about this?
We can certainly handlesingle comparisons
?
We can certainly handlesingle comparisons
We can certainly handlesingle comparisons
less vague
We can certainly handlesingle comparisons
little information
I can manually compare it with all others
And find exactly where it belongs
right after Indiana Jones
right before The Princess
Bride
Full ranking: Compare all pairs
That’s a bittoo much effortfor me
1,000,000 comparisons?
We don’t need all of them
We don’t need all of them
If
We don’t need all of them
If
,
We don’t need all of them
If
,
I have some information about
Compare a random sample of pairs
Use a ranking algorithm that utilizesall the information
Good idea!
Elo rating system
Elo rating system
Elo rating system
Elo rating system
7.00
“hotness”
Elo rating system
7.00
“hotness” range
+1.50-1.50
Elo rating system
7.00 8.00+1.50-1.50 +1.50-1.50
Elo rating system
7.00 8.00+1.50-1.50 +1.50-1.50
7.12 7.68
Elo rating system
7.00 8.00
7.12 7.68
+1.50-1.50 +1.50-1.50
Elo rating system
7.00 8.00
7.12 7.68
+1.50-1.50 +1.50-1.50
Elo rating system
7.00 8.00+150-150 +150-150
36%to win
64%to win
Elo rating system
How do we find out what these ranges are?
Elo rating system
Start with the same guess for every contender
5.00 5.00 5.00 5.00 5.00 5.00
Elo rating system
5.00 5.00
?
Elo rating system
5.00 5.00
Elo rating system
5.12 4.88
Update the best guesses accordingly
Elo rating system
5.12 5.00
?
Elo rating system
5.24 4.88
Elo rating system
5.24 5.00
?
Elo rating system
5.14 5.10
We don’t need all comparisons
If
,
I have some information about
Elo rating system
7.61 4.02
?
Elo rating system
7.61 4.02
?
89%to win
11%to win
Elo rating system
7.61
+.024.02
-.02
89%to win
11%to win
Elo rating system
7.61
-.534.02
+.53
89%to win
11%to win
Elo rating system
We now have scores on a single scale
9.07 8.42 6.40 4.88 4.20 3.03
Elo rating system
We now have scores on a single scale(estimates of people’s appreciation levels)
9.07 8.42 6.40 4.88 4.20 3.03
Elo rating system
and a ranking
1 2 3 4 5 6
9.07 8.42 6.40 4.88 4.20 3.03
Degree of liking issensitive and vague
Can we somehow applythis to movies, then?
We can do better
We can do betterBayesian ranking algorithms
We can do betterBayesian ranking algorithms
Glicko(The Elo Killer)
1999
We can do betterBayesian ranking algorithms
Glicko(The Elo Killer)
1999
TrueSkill™
2007
Bayesian ranking
4.46 4.01
+- +-
Liking (real time & remembered) depends on time and mood.
Other people's opinions affect our own.
Degree of liking issensitive and vague
Bayesian ranking
4.46 4.01
+- +-
Bayesian ranking
4.46 4.01
+- +-
82%to win
15%to win
3%to draw
Bayesian ranking
?
Bayesian ranking
? 4.3
Elo:Best guess
for the center
Bayesian ranking
? 4.3
Bayesian:It could be
centered around
Bayesian:It could also be
centered around
Bayesian ranking
? 4.2
Bayesian:or
centered around
Bayesian ranking
? 4.4
Bayesian:Less likely
but even around
Bayesian ranking
? 4.5
Bayesian ranking
? 4.3
3.5 4 4.5 5
Pro
babi
lity
Bayesian ranking
? 4.3
3.5 4 4.5 5
Pro
babi
lity
uncertainty
Few comparisons: Lots of uncertainty(anything from 2.3 to 4.5 is quite possible)
2.0 2.5 3.0 3.5 4 4.5 5
Pro
babi
lity
After many comparisons: Quite sure(pretty much between 4.11 to 4.18)
Pro
babi
lity
2.0 2.5 3.0 3.5 4 4.5 5
Bayesian ranking
?
Bayesian ranking
Star Wars
Lord ofthe Rings
2.0 3.0 4.0 5.0
Bayesian ranking
Star Wars
Lord ofthe Rings
2.0 3.0 4.0 5.0
How did they do it?
After:
Your rating changes with time.
A small, constant increasein uncertainty before eachcomparison
3.5 4 4.5 5
Pro
babi
lity
uncertainty
Degree of liking issensitive and vague
Great! We have a system!
I don’t want to spend too much time on this
How many is too many?
Minimum EffortMaximum Information
Minimum EffortMaximum Information
1 3 5 1 3 5 1 3 5 1 3 5 1 3 5
Minimum EffortMaximum Information
Minimum EffortMaximum Information
Minimum EffortMaximum Information
Not reliable by itselfStill carries a lot of information
Minimum EffortMaximum Information
1 3 5
Minimum EffortMaximum Information
1 3 5 1 3 5
I don’t want to spend too much time on this
What else can we do?
Minimum EffortMaximum Information
?
Minimum EffortMaximum Information
?
98%to win
1%to win
1%to draw
Minimum EffortMaximum Information
?
98%to win
Did not learn anything new
Minimum EffortMaximum Information
?
Quite a bit of new information
2%to win
Minimum EffortMaximum Information
?
I can calculate the expected amount of information from a comparison!
Minimum EffortMaximum Information
Minimum EffortMaximum Information
Certain about both moviesWon’t learn a lot
Minimum EffortMaximum Information
Certain about both moviesWon’t learn a lot
Minimum EffortMaximum Information
Certain about both moviesWon’t learn a lot
Don’t know much about eitherWill learn a lot
regardless of outcome
Irmak Frrmack
What are your top twenty
movies?
movievsmovie.datasco.pe
Quantifying human reactions are hard
books
songs
food
politicans
products
celebrities
tv shows
importance of issues
what to spend ‘fun’ budget on
teams in different sports
Degree of liking issensitive and vague
Amazing! Total garbage
Tuesday 3am Sunday 12pm
Quantifying reactions is very useful
customized websites
sorting search results
recommendations
connecting with other people of similar tastes
identifying meaningful groups ofsimilar products / people
understanding your own preferences
Quantifying reactions is very useful
Quantifying human reactions are hard
Start with a rating,pose the correct comparisons
Quantifying human reactions are hard
Start with a rating,pose the correct comparisons
Every decision gets us closer
Degree of liking issensitive and vague
Amazing! Total garbage
Tuesday 3am Sunday 12pm
Many comparisons for a movie
over different days
averages out mood and other factors
Many comparisons for a movie
over different days
averages out mood and other factorsWe can’t do much about social influence,
but we should just accept thatas natural part of how much we like things
Degree of liking issensitive and vague
Amazing! Total garbage
Tuesday 3am Sunday 12pm
A great way of collecting desired data
is to make it fun
movievsmovie.datasco.pe
Thanks