50
The BellKor 2008 Solution to the Netflix Prize by Leenarat Leelapanyalert

The BellKor 2008 Solution to the Netflix Prize

  • Upload
    mika

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

The BellKor 2008 Solution to the Netflix Prize. by Leenarat Leelapanyalert. Netflix Dataset. Over 100 million movie ratings with date-stamp (100,480,507 ratings) M = 17,770 movies N = 480,189 customers 1 (star) = no interest, 5(stars) = strong interest Dec 31, 1999 – Dec 31, 2005 - PowerPoint PPT Presentation

Citation preview

Page 1: The  BellKor  2008 Solution  to the Netflix Prize

The BellKor 2008 Solution to the Netflix Prize

byLeenarat Leelapanyalert

Page 2: The  BellKor  2008 Solution  to the Netflix Prize

Netflix Dataset

• Over 100 million movie ratings with date-stamp (100,480,507 ratings)

• M = 17,770 movies• N = 480,189 customers• 1 (star) = no interest, 5(stars) = strong interest• Dec 31, 1999 – Dec 31, 2005

The user-item matrixN*M = 8,532,958,530 elements

98.9% values are missing

Page 3: The  BellKor  2008 Solution  to the Netflix Prize

Netflix Competition

• 4.2 from 100 million ratings– Training set (Probe set)– Qualifying set (Quiz set & Test set)

• Scoring– Show RMSE achieved on the Quiz set– Best RMSE on the Test set → THE WINNER!!

Page 4: The  BellKor  2008 Solution  to the Netflix Prize

Outline

• Necessary index letters• Baseline predictors– With temporal effects

• Latent factor models– with temporal effects

• Neighborhood models– with temporal effects

• Integrated models• Extra: Shrinking towards recent actions

Page 5: The  BellKor  2008 Solution  to the Netflix Prize

Outline• Necessary index letters

• Baseline predictors →– With temporal effects

• Latent factor models– with temporal effects

• Neighborhood models– with temporal effects

• Integrated models• Extra: Shrinking towards recent actions

Adjust deviations of each user (rater, customer) and item (movie)

Page 6: The  BellKor  2008 Solution  to the Netflix Prize

Outline• Necessary index letters• Baseline predictors– With temporal effects

• Latent factor models →– with temporal effects

• Neighborhood models– with temporal effects

• Integrated models• Extra: Shrinking towards recent actions

Compare between items and usersby SVD

Page 7: The  BellKor  2008 Solution  to the Netflix Prize

Outline• Necessary index letters• Baseline predictors– with temporal effects

• Latent factor models– with temporal effects

• Neighborhood models→– with temporal effects

• Integrated models• Extra: Shrinking towards recent actions

Compute the relationship between items(or users)

Page 8: The  BellKor  2008 Solution  to the Netflix Prize

Outline• Necessary index letters• Baseline predictors

- with temporal effects • Latent factor models

–with temporal effects • Neighborhood models

–with temporal effects• Integrated models• Extra: Other methods

– Shrinking towards recent actions– Blending multiple solutions

Page 9: The  BellKor  2008 Solution  to the Netflix Prize

Outline• Necessary index letters• Baseline predictors– with temporal effects

• Latent factor models– with temporal effects

• Neighborhood models– with temporal effects

• Integrated models →• Extra:Shrinking towards recent actions

Combine Latent factor models and Neighborhood models together

Page 10: The  BellKor  2008 Solution  to the Netflix Prize

Outline• Necessary index letters• Baseline predictors– with temporal effects

• Latent factor models– with temporal effects

• Neighborhood models– with temporal effects

• Integrated models

• Extra: Shrinking towards recent actions → New ideas

Page 11: The  BellKor  2008 Solution  to the Netflix Prize

Index Letters

• u,v → users, raters, or customers• i,j → movies, or items• rui → the score by user u of movie i• rui → predicted value of rui

• tui → the time of rating rui

• K → the training set which rui is known• R(u) → all the items for which rating by u• R(i) → the set of users who rated item i• N(u) → all items that can estimated u’s score

^

Page 12: The  BellKor  2008 Solution  to the Netflix Prize

Baseline Predictors (bui)

µ → the overall average ratingbu → deviations of user ubi → deviation of item i

Example: µ = 3.7, Simha(bu) = -0.3,Titanic (bi) = 0.5 bui = 3.7 – 0.3 + 0.5 = 3.9 stars

iuui bbb

Page 13: The  BellKor  2008 Solution  to the Netflix Prize

Estimate Parameter (bu, bi) – Formula

The regularization parameters (𝜆1,𝜆2) are determined by validation on the Probe set. In this case: 𝜆1 = 25, 𝜆2 = 10

iuui bbb

)()(

1

)(

iRr

b iRui

)()(

2

)(

uRbr

b iuRiu

Page 14: The  BellKor  2008 Solution  to the Netflix Prize

Estimate Parameter (bu, bi) – The Least Squares Problem

iuui bbb

i

iKiu u

uiuuibbbbbr )()(min 2

),(

21

2

*

Page 15: The  BellKor  2008 Solution  to the Netflix Prize

Estimate Parameter (bu, bi) – The Least Squares Problem

iuui bbb

i

iKiu u

uiuuibbbbbr )()(min 2

),(

21

2

*

to fit the given rating to avoid overfitting by penalizing the magnitudes of the parameters

Page 16: The  BellKor  2008 Solution  to the Netflix Prize

Time Change VS Baseline Predictors

• An item’s popularity may change over time• Users change their baseline rating over time

)()( uiiuiuui tbtbb

iuui bbb

Page 17: The  BellKor  2008 Solution  to the Netflix Prize

bi(tui)

• We do not expect movie likeability to fluctuate on a daily basis

• Time periods → Bins• 30 bins

)(,)( tBiniii bbtb

)()( uiiuiuui tbtbb

Page 18: The  BellKor  2008 Solution  to the Netflix Prize

bu(tui)

• Unlike movies, user effects can change on a daily basis

• Time deviation

tu → the mean date of rating by tut → the date that user u rated the movieβ = 0.4 by validation on the Probe set

uuu ttttsigntdev )()(

Page 19: The  BellKor  2008 Solution  to the Netflix Prize

bu(tui)

• Suit well with gradual drifts

)()()1( tdevbtb uuuu

uuu ttttsigntdev )()(

)()( uiiuiuui tbtbb

Page 20: The  BellKor  2008 Solution  to the Netflix Prize

bu(tui)

• How about sudden drifts?– Since we found that multiple ratings that a user

gives in a single day

• A user rates on 40 different days on average• Thus, but requires about 40 parameters per user

utuuuu btdevbtb )()()3(

Page 21: The  BellKor  2008 Solution  to the Netflix Prize

Baseline Predictors

)(,,)()(uiui tBiniituuiuuuui bbbtdevbtb

Page 22: The  BellKor  2008 Solution  to the Netflix Prize

Baseline Predictors

)(,,)()(uiui tBiniituuiuuuui bbbtdevbtb

Bu (user bias) Bi (movie bias)

Page 23: The  BellKor  2008 Solution  to the Netflix Prize

Baseline Predictors

• Movie bias is not completely user-independent

cu(t) → time-dependent scaling featurecu → (stable part)cut → (day-specific variable)

)(,,)()(uiui tBiniituuiuuuui bbbtdevbtb

Bu (user bias) Bi (movie bias)

)()()()( )(,, uiutBiniituuiuuuui tcbbbtdevbtbuiui

utuu cctc )(

Page 24: The  BellKor  2008 Solution  to the Netflix Prize

RMSE = 0.9555)()()()( )(,, uiutBiniituuiuuuui tcbbbtdevbtb

uiui

Page 25: The  BellKor  2008 Solution  to the Netflix Prize

Frequencies (additional)

• The number of ratings a user gave on a specific daySIGNIFICANT

Fui → the overall number of ratings that user u gave on day tui bif → the bias specific for the item i at log-frequency fRMSE 0.9555 → 0.9278

uiaui Ff log

uiuiui fiuiutBiniituuiuuuui btcbbbtdevbtb ,)(,, )()()()(

Page 26: The  BellKor  2008 Solution  to the Netflix Prize

Why Frequencies Work?

• Bad when using with user-movie interaction terms• Nothing when using with user-related parameters• Rate a lot in a bulk → Not closely to the actual watching day– Positive approach– Negative approach

• High frequencies (or bulk ratings) do not represent much change in people’s taste, but mostly biased selection of movies

Page 27: The  BellKor  2008 Solution  to the Netflix Prize

Predicting Future Days

• The day-specific parameters should be set to default value

• cu(tui) = cu

• bu,t = 0

• The transient temporal model doesn’t attempt to capture future changes.

Page 28: The  BellKor  2008 Solution  to the Netflix Prize

Latent Factor Models

• To transform both items and users to the same latent factor space– Obvious dimensions• Comedy VS Drama• Amount of action• Orientation to children

– Less well defined dimensions• Depth of character development

• Tool → SVD

Page 29: The  BellKor  2008 Solution  to the Netflix Prize

Singular Value Decomposition (SVD)

• Factoring matrices into a series of linear approximations that expose the underlying structure of the matrix

Page 30: The  BellKor  2008 Solution  to the Netflix Prize

Singular Value Decomposition (SVD)

A B C

Simha 4 4 4

Ateeq 5 5 5

Smith 3 3 3

Greg 4 4 4

Mcq 4 4 4

Ramin 4 4 4

Xiao 4 4 4

Wu 3 3 3

Riz 5 5 5

Predicted Score = User Baseline Rating * Movie Average Score

4

5

3

4

4

4

4

3

5

1 1 1= *

Page 31: The  BellKor  2008 Solution  to the Netflix Prize

Singular Value Decomposition (SVD)

A B C

Simha 4 4 5

Ateeq 4 5 5

Smith 3 3 2

Greg 4 5 4

Mcq 4 4 4

Ramin 3 5 4

Xiao 4 4 3

Wu 2 4 4

Riz 5 5 5

Predicted Score = User Baseline Rating * Movie Average Score

Page 32: The  BellKor  2008 Solution  to the Netflix Prize

Singular Value Decomposition (SVD)

A B C

Simha 3.95 4.64 4.34

Ateeq 4.27 5.02 4.69

Smith 2.42 2.85 2.66

Greg 3.97 4.67 4.36

Mcq 3.64 4.28 4.00

Ramin 3.69 4.33 4.05

Xiao 3.33 3.92 3.66

Wu 3.08 3.63 3.39

Riz 4.55 5.35 5.00

Predicted Score = User Baseline Rating * Movie Average Score

4.34

4.69

2.66

4.36

4.00

4.05

3.66

3.39

5.00

0.91 1.07 1.00= *

Page 33: The  BellKor  2008 Solution  to the Netflix Prize

Singular Value Decomposition (SVD)

A B C

Simha 3.95 4.64 4.34

Ateeq 4.27 5.02 4.69

Smith 2.42 2.85 2.66

Greg 3.97 4.67 4.36

Mcq 3.64 4.28 4.00

Ramin 3.69 4.33 4.05

Xiao 3.33 3.92 3.66

Wu 3.08 3.63 3.39

Riz 4.55 5.35 5.00

Predicted Score = User Baseline Rating * Movie Average Score

-

A B C

Simha 4 4 5

Ateeq 4 5 5

Smith 3 3 2

Greg 4 5 4

Mcq 4 4 4

Ramin 3 5 4

Xiao 4 4 3

Wu 2 4 4

Riz 5 5 5

Page 34: The  BellKor  2008 Solution  to the Netflix Prize

Singular Value Decomposition (SVD)

A B C

Simha 0.05 -0.64 0.66

Ateeq -0.28 -0.02 0.31

Smith 0.58 0.15 -0.66

Greg 0.03 0.33 -0.36

Mcq 0.36 -0.28 0.00

Ramin -0.69 0.67 -0.05

Xiao 0.67 0.08 -0.66

Wu -1.08 0.37 0.61

Riz 0.45 -035 0.00

Predicted Score = User Baseline Rating * Movie Average Score

-0.18

-0.38

0.80

0.15

0.35

-0.67

0.89

-1.29

0.44

0.82 -0.20

-0.53= *

Page 35: The  BellKor  2008 Solution  to the Netflix Prize

Singular Value Decomposition (SVD)

A B C

Simha 4 4 5

Ateeq 4 5 5

Smith 3 3 2

Greg 4 5 4

Mcq 4 4 4

Ramin 3 5 4

Xiao 4 4 3

Wu 2 4 4

Riz 5 5 5

Predicted Score = User Baseline Rating * Movie Average Score

4.34 -0.18 -0.90

4.69 -0.38 -0.15

2.66 0.80 0.40

4.36 0.15 0.47

4.00 0.35 -0.29

4.05 -0.67 0.68

3.66 0.89 0.33

3.39 -1.29 0.14

5.00 0.44 -0.36

0.91 1.07 1.00

0.82 -0.20 -0.53

-0.21 0.76 -0.62= *

Page 36: The  BellKor  2008 Solution  to the Netflix Prize

Latent Factor Models

pu → user-factors vectorqi → item-factors vector

• Add implicit feedback– Asymmetric-SVD

– SVD++60 factorsRMSE =

0.8966

iTuuiui qpbr ˆ

)(

21

)(ˆuNj

juTiuiui yuNpqbr

)()(

21

21

)()()(ˆuNj

juRj

jujujTiuiui yuNxbruRqbr

Page 37: The  BellKor  2008 Solution  to the Netflix Prize

Temporal Effects

• Time– Movie biases – go in and out of popularity over time

bi

– User biases – user change their baseline ratings over time bu

– User preferences – genre, perception on actors and directors, household

pu

)(

21

)()()(ˆuNj

juTiuiui yuNtpqtbr

Page 38: The  BellKor  2008 Solution  to the Netflix Prize

Temporal Effects

• The same way we treat user bias we can also treat the user preferences

k=1,2,…,f

k=1,2,…,f

)(),...,(),()( 21 tptptptp ufuuT

u

)()()1( tdevbtb uuuu

utuuuu btdevbtb )()()3(

)()()1( tdevptp uukukuk

tukukuk ptptp ,)1()3( )()(

Page 39: The  BellKor  2008 Solution  to the Netflix Prize

RMSE

f = 500RMSE =

0.8815

f = 500RMSE = 0.8841 !!

• Most accurate factor model (add frequencies)

f = 500, RMSE = 0.8784 f = 2000, RMSE = 0.8762

)()()1( tdevptp uukukuk

tukukuk ptptp ,)1()3( )()(

)(

21

)()()(ˆuNj

juTiuiui yuNtpqtbr

)(,

21

)()()()(ˆuNj

juTfi

Tiuiui yuNtpqqtbr

ui

Page 40: The  BellKor  2008 Solution  to the Netflix Prize

Neighborhood Models

• To compute the relationship between items• Evaluate the score of a user to an item based

on ratings of similar items by the same user

Page 41: The  BellKor  2008 Solution  to the Netflix Prize

The Similarity Measure

• The Pearson correlation coefficient, ρij

Page 42: The  BellKor  2008 Solution  to the Netflix Prize

The Similarity Measure

• The Pearson correlation coefficient, ρij;λ2 = 100sij – similaritynij – the number of users that rated both i and j

• A weighted average of the ratings of neighborhood items

Sk(i;u) – the set of k items rated by u, which are most similar to i

ijij

ijdef

ij nn

s 2

);(

);()(

ˆuiSj ij

uiSj ujujijuiui

k

k

s

brsbr

Page 43: The  BellKor  2008 Solution  to the Netflix Prize

Problem With The Model

• Isolate the relations between 2 items• Fully rely on the neighbors, even if they are absent

• The wij’s are not user specific• Sum over all item rated by u

);(

);()(

ˆuiSj ij

uiSj ujujijuiui

k

k

s

brsbr

)(

)(ˆuRj

ijujujuiui wbrbr

Page 44: The  BellKor  2008 Solution  to the Netflix Prize

Improving The Model

• Isolate the relations between 2 items• Fully rely on the neighbors, even if they are absent

• The wij’s are not user specific• Sum over all item rated by u

• Not only what he rated, but also what he did not rate.• cij is expected to be high if j is predictive on i

);(

);()(

ˆuiSj ij

uiSj ujujijuiui

k

k

s

brsbr

)(

)(ˆuRj

ijujujuiui wbrbr

)()(

)(ˆuNjij

uRjijujujuiui cwbrbr

Page 45: The  BellKor  2008 Solution  to the Netflix Prize

Improving The Model

• The current model somewhat overemphasizes the dichotomy between heavy raters and those that rarely rate

• Moderate this behavior by normalization

• 𝛼 = 0 → non-normalized rule – encourages greater deviations• 𝛼 = 1 → fully normalized rule – eliminate the effect of number of rating• In this case, 𝛼 = 0.5RMSE = 0.9002

)()(

)(ˆuNjij

uRjijujujuiui cwbrbr

)()(

21

21

)()()(ˆuNjij

uRjijujujuiui cuNwbruRbr

Page 46: The  BellKor  2008 Solution  to the Netflix Prize

Improving The Model

RMSE = 0.9002• Reduce the model by pruning parameters

Sk(i) – the set of k items most similar i

k = 17,770 → RMSE = 0.8906k = 2000 → RMSE = 0.9067

)()(

21

21

)()()(ˆuNjij

uRjijujujuiui cuNwbruRbr

);();(

21

21

);()();(ˆuiNjij

k

uiRjijujuj

kuiui

kk

cuiNwbruiRbr

)()();( iSuRuiR kdef

k

)()();( iSuNuiN kdef

k

Page 47: The  BellKor  2008 Solution  to the Netflix Prize

Integrated Models• Baseline predictors + Factor models + Neighborhood models

f = 170, k = 300 → RMSE = 0.8827• Further improve accuracy, we add a more elaborated temporal model for the user bias

f = 170, k = 300 → RMSE = 0.8786

);();()(

)1()1( 21

21

21

);()();()()()()(ˆuiNjij

k

uiRjijujuj

k

uNjju

Tiiuui

kk

cuiNwbruiRyuNtpqtbtbr

);();()(

)1()3( 21

21

21

);()();()()()()(ˆuiNjij

k

uiRjijujuj

k

uNjju

Tiiuui

kk

cuiNwbruiRyuNtpqtbtbr

Page 48: The  BellKor  2008 Solution  to the Netflix Prize

EXTRA: Shrinking Towards Recent Actions

• To correct rui

• Shrink rui towards the average rating of u on day t• The single day effect is among the strongest temporal

effects in data α = 8β = 11nut – the number of ratings u gave on day trut – the mean rating of u at day tVut – the variance of u’s ratings at day t

ut

ututui

crcr

ˆ

)exp( ututut Vnc

Page 49: The  BellKor  2008 Solution  to the Netflix Prize

Shrinking Towards Recent Actions• A stronger corrections accounts for periods longer than a single day• And tries to characterize the recent user behavior on similar movies

ui

uiuiui

crcr

)exp( uiuiut Vnc )exp( ujuiij

uij ttsw

jratedu

uijui wn

__

jratedu

uij

jrateduuj

uij

ui w

rwr

__

__

2

__

__

2

)()(

ui

jratedu

uij

jrateduuj

uij

ui rw

rwV

Page 50: The  BellKor  2008 Solution  to the Netflix Prize

Q & A