Movie #7614 movie #3250 movie #15868 Lessons from the Netflix Prize Yehuda Koren The BellKor team...

movie #7614 movie #3250

movie #15868

Lessons from the Netflix Prize

Yehuda KorenThe BellKor team

(with Robert Bell & Chris Volinsky)

We Know What You OughtTo Be Watching This

Summer

Collaborative filtering• Recommend items based on past transactions of users

• Analyze relations between users and/or items

• Specific data characteristics are irrelevant

– Domain-free: user/item attributes are not necessary

– Can identify elusive aspects

usermoviescore

Training data Test data

Movie rating data

Netflix Prize• Training data

– 100 million ratings– 480,000 users– 17,770 movies– 6 years of data: 2000-2005

• Test data– Last few ratings of each user (2.8 million)– Evaluation criterion: root mean squared error (RMSE) – Netflix Cinematch RMSE: 0.9514

• Competition– 2700+ teams– $1 million grand prize for 10% improvement on Cinematch result– $50,000 2007 progress prize for 8.43% improvement

Overall rating distribution

• Third of ratings are 4s• Average rating is 3.68

From TimelyDevelopment.com

#ratings per movie

• Avg #ratings/movie: 5627

#ratings per user

• Avg #ratings/user: 208

Average movie rating by movie count

• More ratings to better movies

From TimelyDevelopment.com

Most loved movies

TitleAvg ratingCount

The Shawshank Redemption 4.593 137812

Lord of the Rings: The Return of the King4.545 133597

The Green Mile 4.306 180883

Lord of the Rings: The Two Towers 4.460 150676

Finding Nemo 4.415 139050

Raiders of the Lost Ark 4.504 117456

Forrest Gump 4.299 180736

Lord of the Rings: The Fellowship of the ring4.433 147932

The Sixth Sense 4.325 149199

Indiana Jones and the Last Crusade 4.333 144027

Important RMSEs

Grand Prize: 0.8563; 10% improvement

BellKor: 0.8693; 8.63% improvement

Cinematch: 0.9514; baseline

Movie average: 1.0533

User average: 1.0651

Global average: 1.1296

Inherent noise ???? :

Personalization

erroneous

accurate

Challenges

• Size of data– Scalability

– Keeping data in memory

• Missing data– 99 percent missing

– Very imbalanced

• Avoiding overfitting• Test and training data differ significantly movie #16322

• Use an ensemble of complementing predictors• Two, half tuned models worth more than a single,

fully tuned model

The BellKor recommender system

Effect of ensemble size

0.8725

0.8775

0.8825

0.8875

0.8925

1 8 15 22 29 36 43 50

#Predictors

• Use an ensemble of complementing predictors• Two, half tuned models worth more than a single,

fully tuned model• But:

Many seemingly different models expose similar characteristics of the data, and won’t mix well

• Concentrate efforts along three axes...

The BellKor recommender system

The first axis:• Multi-scale modeling of the data• Combine top level, regional modeling of

the data, with a refined, local view:– k-NN: Extracting local patterns– Factorization: Addressing regional

effects

The three dimensions of the BellKor system

Global effects

Factorization

Multi-scale modeling – 1st tier

• Mean rating: 3.7 stars• The Sixth Sense is 0.5 stars above

avg• Joe rates 0.2 stars below avgBaseline estimation:

Joe will rate The Sixth Sense 4 stars

Global effects:

Multi-scale modeling – 2nd tier

• Both The Sixth Sense and Joe are placed high on the “Supernatural Thrillers” scale

Adjusted estimate:Joe will rate The Sixth Sense 4.5 stars

Factors model:

Multi-scale modeling – 3rd tier

• Joe didn’t like related movie Signs

Final estimate:Joe will rate The Sixth Sense 4.2 stars

Neighborhood model:

The second axis:• Quality of modeling• Make the best out of a model • Strive for:

– Fundamental derivation– Simplicity– Avoid overfitting– Robustness against #iterations,

parameter setting, etc. • Optimizing is good, but don’t overdo it!

global

quality

• The third dimension will be discussed later...

• Next:Moving the multi-scale view along the quality axis

global

quality

• Earliest and most popular collaborative filtering method• Derive unknown ratings from those of “similar” items (movie-movie

variant)• A parallel user-user flavor: rely on ratings of like-minded users (not

in this talk)

Local modeling through k-NN

123456789101112

113554

2544213

324123435

424542

5434225

613324

-unknown rating -rating between 1 to 5

123456789101112

113? 554

2544213

324123435

424542

5434225

613324

-estimate rating of movie 1 by user 5

123456789101112

113? 554

2544213

324123435

424542

5434225

613324

Neighbor selection:Identify movies similar to 1, rated by user 5

123456789101112

113? 554

2544213

324123435

424542

5434225

613324

Compute similarity weights:s13=0.2, s16=0.3

123456789101112

1132.6554

2544213

324123435

424542

5434225

613324

Predict by taking weighted average:(0.2*2+0.3*3)/(0.2+0.3)=2.6

Properties of k-NN

• Intuitive• No substantial preprocessing is required• Easy to explain reasoning behind a recommendation• Accurate?

Grand Prize: 0.8563

BellKor: 0.8693

Cinematch: 0.9514

k-NN on the RMSE scale

erroneous

accurate

k-NN - Common practice1. Define a similarity measure between items: sij

2. Select neighbors -- N(i;u): items most similar to i, that were rated by u

3. Estimate unknown rating, rui, as the weighted average:

ij uj ujj N i uui ui

ijj N i u

s r br b

baseline estimate for rui

k-NN - Common practice1. Define a similarity measure between items: sij

2. Select neighbors -- N(i;u): items similar to i, rated by u

3. Estimate unknown rating, rui, as the weighted average:

Problems:

1. Similarity measures are arbitrary; no fundamental justification

2. Pairwise similarities isolate each neighbor; neglect interdependencies among neighbors

3. Taking a weighted average is restricting; e.g., when neighborhood information is limited

ij uj ujj N i uui ui

ijj N i u

s r br b

(We allow )

Interpolation weights

• Use a weighted sum rather than a weighted average:

1ijj N i u

( ; )ui ui ij uj ujj N i ur b w r b

• Model relationships between item i and its neighbors• Can be learnt through a least squares problem from all

other users that rated i:

( ; )Minw vi vi ij vj vjv u j N i u

r b w r b

Interpolation weights

( ; )Minw vi vi ij vj vjv u j N i u

r b w r b

• Interpolation weights derived based on their role; no use of an arbitrary similarity measure

• Explicitly account for interrelationships among the neighbors

Challenges:• Deal with missing values• Avoid overfitting• Efficient implementation

Mostly unknown

Estimate inner-products among movie ratings

Geared towards females

Geared towards

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11

Sense and Sensibility

Latent factor models

544213

24123435

434225

.1-.4.2

-.5.6.5

-.2.3.5

1.12.1.3

-.72.1-2

-1.7.3

1.1-.2.3.5-2-.5.8-.4.31.42.4-.9

-.8.7.51.4.3-11.42.9-.71.2-.11.3

2.1-.4.61.72.4.9-.3.4.8.7-.6.1

A rank-3 SVD approximation

Estimate unknown ratings as inner-products of factors:

544213

24123435

434225

.1-.4.2

-.5.6.5

-.2.3.5

1.12.1.3

-.72.1-2

-1.7.3

1.1-.2.3.5-2-.5.8-.4.31.42.4-.9

-.8.7.51.4.3-11.42.9-.71.2-.11.3

2.1-.4.61.72.4.9-.3.4.8.7-.6.1

544213

24123435

434225

.1-.4.2

-.5.6.5

-.2.3.5

1.12.1.3

-.72.1-2

-1.7.3

1.1-.2.3.5-2-.5.8-.4.31.42.4-.9

-.8.7.51.4.3-11.42.9-.71.2-.11.3

2.1-.4.61.72.4.9-.3.4.8.7-.6.1

544213

24123435

434225

.1-.4.2

-.5.6.5

-.2.3.5

1.12.1.3

-.72.1-2

-1.7.3

1.1-.2.3.5-2-.5.8-.4.31.42.4-.9

-.8.7.51.4.3-11.42.9-.71.2-.11.3

2.1-.4.61.72.4.9-.3.4.8.7-.6.1

Latent factor models13554

544213

24123435

434225

.1-.4.2

-.5.6.5

-.2.3.5

1.12.1.3

-.72.1-2

-1.7.3

1.1-.2.3.5-2-.5.8-.4.31.42.4-.9

-.8.7.51.4.3-11.42.9-.71.2-.11.3

2.1-.4.61.72.4.9-.3.4.8.7-.6.1

Properties:• SVD isn’t defined when entries are unknown use

specialized methods• Very powerful model can easily overfit, sensitive to

regularization• Probably most popular model among contestants

– 12/11/2006: Simon Funk describes an SVD based method– 12/29/2006: Free implementation at timelydevelopment.com

Grand Prize: 0.8563

BellKor: 0.8693

Cinematch: 0.9514

Factorization on the RMSE scale

factorization

Our approach

• User factors: Model a user u as a vector pu ~ Nk(, )

• Movie factors:Model a movie i as a vector qi ~ Nk(γ, Λ)

• Ratings:Measure “agreement” between u and i: rui ~ N(pu

Tqi, ε2)

• Maximize model’s likelihood:– Alternate between recomputing user-factors,

movie-factors and model parameters– Special cases:

• Alternating Ridge regression• Nonnegative matrix factorization

Combining multi-scale views

global effects

regional effects

local effects

Residual fitting

factorization

Weighted average

A unified model

factorization

Localized factorization model

• Standard factorization: User u is a linear function parameterized by pu

rui = puTqi

• Allow user factors – pu – to depend on the item being predicted

rui = pu(i)Tqi

• Vector pu(i) models behavior of u on items like i

RMSE vs. #Factors

0 20 40 60 80

#Factors

SE Factorization

Localized factorization

Results on Netflix Probe set

More accurate

Top contenders for Progress Prize 2007

ML@Toronto

How low can he go?

wxyzConsulting

Gravity

BellKor

Grand prize

multi-scale modeling

• Can exploit movie titles and release year• But movies side is pretty much covered anyway...• It’s about the users!• Turning to the third dimension...

Seek alternative perspectives of the data

global

quality

The third dimension of the BellKor system

• A powerful source of information:Characterize users by which movies they rated, rather than how they rated

A binary representation of the data

544213

24123435

434225

101001001010

001100100111

110110101110

011010010010

001111000011

101010010010

usersm

The third dimension of the BellKor system

movie #17270

Great news to recommender systems:• Works even better on real life datasets (?)• Improve accuracy by exploiting implicit

feedback• Implicit behavior is abundant and easy to

collect:– Rental history– Search patterns– Browsing history– ...

• Implicit feedback allows predicting personalized ratings for users that never rated!

binaryimplicit

global

quality

Where do you want to be?• All over the global-local axis• Relatively high on the quality axis• Can we go in between the explicit-

implicit axis?• Yes! Relevant methods:

– Conditional RBMs [ML@UToronto]– NSVD [Arek Paterek]– Asymmetric factor models

ratingsexplicit

LessonsWhat it takes to win:1. Think deeper – design better algorithms2. Think broader – use an ensemble of multiple predictors3. Think different – model the data from different perspectives

At the personal level:1. Have fun with the data2. Work hard, long breath3. Good teammates

Rapid progress of science:1. Availability of large, real life data2. Challenge, competition3. Effective collaboration

movie #13043

544213

24123435

434225

Yehuda Koren

AT&T Labs – Research

yehuda@att.com

BellKor homepage:www.research.att.com/~volinsky/netflix/

Movie #7614 movie #3250 movie #15868 Lessons from the Netflix Prize Yehuda Koren The BellKor team...

Documents

8 - 1 Chapter 8 Muscular System. 8 - 2 Athletic Training and Muscular System Movie Movie Movie

Movie vs movie

The BellKor 2008 Solution to the Netflix Prize

koren.ppt - cs.duke.edu€¦ · 10/5/2009 3 • Use an ensemble of complementing predictors den uftl ha•T,ow models worth more than a single, fully tuned model • But: The BellKor

5 movie mas mayo [movie-mas.com]

Tema 3. La estrategia de marketing: briefing y ...rua.ua.es/dspace/bitstream/10045/15868/1/Tema 4. La estrategia... · Tema 4. La estrategia publicitaria. El planner Lola Fernández

Molecular genetic studies on pediocin-like bacteriocinshodhganga.inflibnet.ac.in/bitstream/10603/15868/21/21_appendix.pdfTITLE Evidence for pediocin PA-1 like bacteriocin production

Windows Movie Maker - api.ning.comapi.ning.com/.../UsingwindowsMovieMaker.pdf · Windows Movie Maker Save a Movie to the Computer 1. From File Menu, select Save Movie File. 2. On

MOVIE MAKER: -Make a movie about PUBLISHER: alternative

Movie based management wednesday movie

Analysis of Soluble Rhenium Concentrations in the Refractory ......PNNL-15868 Analysis of Soluble Rhenium Concentrations in the Refractory from Bulk Vitrification Full-Scale Test 38B

S. Manjulata Devishodhganga.inflibnet.ac.in/bitstream/10603/15868/13... · to have wider application in the field of food industry. 2. 1. 3. BACTERIOCINS Several of the bacteria have

Impact of Movie Streaming over Traditional DVD Movie ... · Impact of Movie Streaming over Traditional DVD Movie Rental—An ... rental company Blockbuster that ... streaming vs

Negnu network create movie on windows movie maker

Fandango Affiliate Network - Movie Tickets & Movie Times

Data-driven Blockbuster Planning on Online Movie Knowledge ... · movie knowledge library, a novel movie planning framework “Blockbuster Planning with Maximized Movie Configuration

Movie Monsters can be animals people creatures Movie monsters can be… Movie Monsters can be

The Three Musketeers - Daily Script - Movie Scripts and Movie

Paraphrasing A Movie: How to Cite A Movie Scene

Lessons Learned from the Netflix Contest Arthur …ceick/7362/Arthur2.pdfThe BellKor Solution to the Netflix Grand Prize Koren Notations: – Users referred to with letters u and v