The BellKor 2008 Solution to the Netflix Prize

byLeenarat Leelapanyalert

Netflix Dataset

• Over 100 million movie ratings with date-stamp (100,480,507 ratings)

• M = 17,770 movies• N = 480,189 customers• 1 (star) = no interest, 5(stars) = strong interest• Dec 31, 1999 – Dec 31, 2005

The user-item matrixN*M = 8,532,958,530 elements

98.9% values are missing

Netflix Competition

• 4.2 from 100 million ratings– Training set (Probe set)– Qualifying set (Quiz set & Test set)

• Scoring– Show RMSE achieved on the Quiz set– Best RMSE on the Test set → THE WINNER!!

Outline

• Necessary index letters• Baseline predictors– With temporal effects

• Latent factor models– with temporal effects

• Neighborhood models– with temporal effects

• Integrated models• Extra: Shrinking towards recent actions

Outline• Necessary index letters

• Baseline predictors →– With temporal effects

Adjust deviations of each user (rater, customer) and item (movie)

Outline• Necessary index letters• Baseline predictors– With temporal effects

• Latent factor models →– with temporal effects

Compare between items and usersby SVD

Outline• Necessary index letters• Baseline predictors– with temporal effects

• Neighborhood models→– with temporal effects

Compute the relationship between items(or users)

Outline• Necessary index letters• Baseline predictors

- with temporal effects • Latent factor models

–with temporal effects • Neighborhood models

–with temporal effects• Integrated models• Extra: Other methods

– Shrinking towards recent actions– Blending multiple solutions

• Integrated models →• Extra:Shrinking towards recent actions

Combine Latent factor models and Neighborhood models together

• Integrated models

• Extra: Shrinking towards recent actions → New ideas

Index Letters

• u,v → users, raters, or customers• i,j → movies, or items• rui → the score by user u of movie i• rui → predicted value of rui

• tui → the time of rating rui

• K → the training set which rui is known• R(u) → all the items for which rating by u• R(i) → the set of users who rated item i• N(u) → all items that can estimated u’s score

Baseline Predictors (bui)

µ → the overall average ratingbu → deviations of user ubi → deviation of item i

Example: µ = 3.7, Simha(bu) = -0.3,Titanic (bi) = 0.5 bui = 3.7 – 0.3 + 0.5 = 3.9 stars

iuui bbb

Estimate Parameter (bu, bi) – Formula

The regularization parameters (𝜆1,𝜆2) are determined by validation on the Probe set. In this case: 𝜆1 = 25, 𝜆2 = 10

iuui bbb

b iRui

b iuRiu

Estimate Parameter (bu, bi) – The Least Squares Problem

iuui bbb

iKiu u

uiuuibbbbbr )()(min 2

Estimate Parameter (bu, bi) – The Least Squares Problem

iuui bbb

iKiu u

uiuuibbbbbr )()(min 2

to fit the given rating to avoid overfitting by penalizing the magnitudes of the parameters

Time Change VS Baseline Predictors

• An item’s popularity may change over time• Users change their baseline rating over time

)()( uiiuiuui tbtbb

iuui bbb

bi(tui)

• We do not expect movie likeability to fluctuate on a daily basis

• Time periods → Bins• 30 bins

)(,)( tBiniii bbtb

)()( uiiuiuui tbtbb

bu(tui)

• Unlike movies, user effects can change on a daily basis

• Time deviation

tu → the mean date of rating by tut → the date that user u rated the movieβ = 0.4 by validation on the Probe set

uuu ttttsigntdev )()(

bu(tui)

• Suit well with gradual drifts

)()()1( tdevbtb uuuu

uuu ttttsigntdev )()(

)()( uiiuiuui tbtbb

bu(tui)

• How about sudden drifts?– Since we found that multiple ratings that a user

gives in a single day

• A user rates on 40 different days on average• Thus, but requires about 40 parameters per user

utuuuu btdevbtb )()()3(

Baseline Predictors

)(,,)()(uiui tBiniituuiuuuui bbbtdevbtb

Baseline Predictors

Bu (user bias) Bi (movie bias)

Baseline Predictors

• Movie bias is not completely user-independent

cu(t) → time-dependent scaling featurecu → (stable part)cut → (day-specific variable)

Bu (user bias) Bi (movie bias)

)()()()( )(,, uiutBiniituuiuuuui tcbbbtdevbtbuiui

utuu cctc )(

RMSE = 0.9555)()()()( )(,, uiutBiniituuiuuuui tcbbbtdevbtb

Frequencies (additional)

• The number of ratings a user gave on a specific daySIGNIFICANT

Fui → the overall number of ratings that user u gave on day tui bif → the bias specific for the item i at log-frequency fRMSE 0.9555 → 0.9278

uiaui Ff log

uiuiui fiuiutBiniituuiuuuui btcbbbtdevbtb ,)(,, )()()()(

Why Frequencies Work?

• Bad when using with user-movie interaction terms• Nothing when using with user-related parameters• Rate a lot in a bulk → Not closely to the actual watching day– Positive approach– Negative approach

• High frequencies (or bulk ratings) do not represent much change in people’s taste, but mostly biased selection of movies

Predicting Future Days

• The day-specific parameters should be set to default value

• cu(tui) = cu

• bu,t = 0

• The transient temporal model doesn’t attempt to capture future changes.

Latent Factor Models

• To transform both items and users to the same latent factor space– Obvious dimensions• Comedy VS Drama• Amount of action• Orientation to children

– Less well defined dimensions• Depth of character development

• Tool → SVD

Singular Value Decomposition (SVD)

• Factoring matrices into a series of linear approximations that expose the underlying structure of the matrix

Simha 4 4 4

Ateeq 5 5 5

Smith 3 3 3

Greg 4 4 4

Mcq 4 4 4

Ramin 4 4 4

Xiao 4 4 4

Wu 3 3 3

Riz 5 5 5

Predicted Score = User Baseline Rating * Movie Average Score

1 1 1= *

Simha 4 4 5

Ateeq 4 5 5

Smith 3 3 2

Greg 4 5 4

Mcq 4 4 4

Ramin 3 5 4

Xiao 4 4 3

Wu 2 4 4

Riz 5 5 5

Simha 3.95 4.64 4.34

Ateeq 4.27 5.02 4.69

Smith 2.42 2.85 2.66

Greg 3.97 4.67 4.36

Mcq 3.64 4.28 4.00

Ramin 3.69 4.33 4.05

Xiao 3.33 3.92 3.66

Wu 3.08 3.63 3.39

Riz 4.55 5.35 5.00

0.91 1.07 1.00= *

Simha 3.95 4.64 4.34

Ateeq 4.27 5.02 4.69

Smith 2.42 2.85 2.66

Greg 3.97 4.67 4.36

Mcq 3.64 4.28 4.00

Ramin 3.69 4.33 4.05

Xiao 3.33 3.92 3.66

Wu 3.08 3.63 3.39

Riz 4.55 5.35 5.00

Simha 4 4 5

Ateeq 4 5 5

Smith 3 3 2

Greg 4 5 4

Mcq 4 4 4

Ramin 3 5 4

Xiao 4 4 3

Wu 2 4 4

Riz 5 5 5

Simha 0.05 -0.64 0.66

Ateeq -0.28 -0.02 0.31

Smith 0.58 0.15 -0.66

Greg 0.03 0.33 -0.36

Mcq 0.36 -0.28 0.00

Ramin -0.69 0.67 -0.05

Xiao 0.67 0.08 -0.66

Wu -1.08 0.37 0.61

Riz 0.45 -035 0.00

0.82 -0.20

-0.53= *

Simha 4 4 5

Ateeq 4 5 5

Smith 3 3 2

Greg 4 5 4

Mcq 4 4 4

Ramin 3 5 4

Xiao 4 4 3

Wu 2 4 4

Riz 5 5 5

4.34 -0.18 -0.90

4.69 -0.38 -0.15

2.66 0.80 0.40

4.36 0.15 0.47

4.00 0.35 -0.29

4.05 -0.67 0.68

3.66 0.89 0.33

3.39 -1.29 0.14

5.00 0.44 -0.36

0.91 1.07 1.00

0.82 -0.20 -0.53

-0.21 0.76 -0.62= *

Latent Factor Models

pu → user-factors vectorqi → item-factors vector

• Add implicit feedback– Asymmetric-SVD

– SVD++60 factorsRMSE =

0.8966

iTuuiui qpbr ˆ

)(ˆuNj

juTiuiui yuNpqbr

)()()(ˆuNj

jujujTiuiui yuNxbruRqbr

Temporal Effects

• Time– Movie biases – go in and out of popularity over time

– User biases – user change their baseline ratings over time bu

– User preferences – genre, perception on actors and directors, household

)()()(ˆuNj

juTiuiui yuNtpqtbr

Temporal Effects

• The same way we treat user bias we can also treat the user preferences

k=1,2,…,f

)(),...,(),()( 21 tptptptp ufuuT

)()()1( tdevbtb uuuu

utuuuu btdevbtb )()()3(

)()()1( tdevptp uukukuk

tukukuk ptptp ,)1()3( )()(

f = 500RMSE =

0.8815

f = 500RMSE = 0.8841 !!

• Most accurate factor model (add frequencies)

f = 500, RMSE = 0.8784 f = 2000, RMSE = 0.8762

)()()1( tdevptp uukukuk

tukukuk ptptp ,)1()3( )()(

)()()(ˆuNj

juTiuiui yuNtpqtbr

)()()()(ˆuNj

Tiuiui yuNtpqqtbr

Neighborhood Models

• To compute the relationship between items• Evaluate the score of a user to an item based

on ratings of similar items by the same user

The Similarity Measure

• The Pearson correlation coefficient, ρij

The Similarity Measure

• The Pearson correlation coefficient, ρij;λ2 = 100sij – similaritynij – the number of users that rated both i and j

• A weighted average of the ratings of neighborhood items

Sk(i;u) – the set of k items rated by u, which are most similar to i

ˆuiSj ij

uiSj ujujijuiui

Problem With The Model

• Isolate the relations between 2 items• Fully rely on the neighbors, even if they are absent

• The wij’s are not user specific• Sum over all item rated by u

ˆuiSj ij

uiSj ujujijuiui

)(ˆuRj

ijujujuiui wbrbr

Improving The Model

• Isolate the relations between 2 items• Fully rely on the neighbors, even if they are absent

• The wij’s are not user specific• Sum over all item rated by u

• Not only what he rated, but also what he did not rate.• cij is expected to be high if j is predictive on i

ˆuiSj ij

uiSj ujujijuiui

)(ˆuRj

ijujujuiui wbrbr

)(ˆuNjij

uRjijujujuiui cwbrbr

Improving The Model

• The current model somewhat overemphasizes the dichotomy between heavy raters and those that rarely rate

• Moderate this behavior by normalization

• 𝛼 = 0 → non-normalized rule – encourages greater deviations• 𝛼 = 1 → fully normalized rule – eliminate the effect of number of rating• In this case, 𝛼 = 0.5RMSE = 0.9002

)(ˆuNjij

uRjijujujuiui cwbrbr

)()()(ˆuNjij

uRjijujujuiui cuNwbruRbr

Improving The Model

RMSE = 0.9002• Reduce the model by pruning parameters

Sk(i) – the set of k items most similar i

k = 17,770 → RMSE = 0.8906k = 2000 → RMSE = 0.9067

)()()(ˆuNjij

uRjijujujuiui cuNwbruRbr

);();(

);()();(ˆuiNjij

uiRjijujuj

cuiNwbruiRbr

)()();( iSuRuiR kdef

)()();( iSuNuiN kdef

Integrated Models• Baseline predictors + Factor models + Neighborhood models

f = 170, k = 300 → RMSE = 0.8827• Further improve accuracy, we add a more elaborated temporal model for the user bias

f = 170, k = 300 → RMSE = 0.8786

);();()(

)1()1( 21

);()();()()()()(ˆuiNjij

uiRjijujuj

Tiiuui

cuiNwbruiRyuNtpqtbtbr

);();()(

)1()3( 21

);()();()()()()(ˆuiNjij

uiRjijujuj

Tiiuui

cuiNwbruiRyuNtpqtbtbr

EXTRA: Shrinking Towards Recent Actions

• To correct rui

• Shrink rui towards the average rating of u on day t• The single day effect is among the strongest temporal

effects in data α = 8β = 11nut – the number of ratings u gave on day trut – the mean rating of u at day tVut – the variance of u’s ratings at day t

ututui

)exp( ututut Vnc

Shrinking Towards Recent Actions• A stronger corrections accounts for periods longer than a single day• And tries to characterize the recent user behavior on similar movies

uiuiui

)exp( uiuiut Vnc )exp( ujuiij

uij ttsw

jratedu

uijui wn

jratedu

jrateduuj

jratedu

jrateduuj

The BellKor 2008 Solution to the Netflix Prize

Documents

Preliminaries Data Mining - University of Notre Damerjohns15/cse40647.sp14/www/content/lectures/02... · recommendation system, Cinematch. •Thus began the Netflix Prize, an open

Biased ART: A neural architecture that shifts attention toward ...techlab.bu.edu/files/resources/articles_cns/Biased_ART.pdf · Netflix Prize database, was developed for this project

1 Analysis of Netflix presented by Vince Wang. 2 Agenda Introduction Introduction What is Netflix? What is Netflix? How Netflix Works? How Netflix Works?

The Pragmatic Theory solution to the Netflix Grand Prize

The Netflix Prize Contest - University of Washingtoncourses.washington.edu/css581/lecture_slides/09a_Netflix_Prize.pdf · more submission ! July 26, 18:18 GMT BPC Makes Their Final

The Netflix Prize

AUTHORS - recodatasets.blob.core.windows.net · 2006 (Netflix prize) Factorization-based Models SVD++ 2010 (Various data competitions) Hybrid models with machine learning LR, FM,

Lessons Learned from the Netflix Contest Arthur …ceick/7362/Arthur2.pdfThe BellKor Solution to the Netflix Grand Prize Koren Notations: – Users referred to with letters u and v

Netflix Prize: Home - 1. Introduction · 2016-07-13 · 2 ˆ T r b b p qui u i u i= + + +µ This model is now widely used among Netflix competitors, as evident by Netflix Prize Forum

Netflix Prize Solution: A Matrix Factorization Approach

Hacking Netflix - Netflix APIs

Data Mining - Volinsky - 2011 - Columbia University 1 Topic 12 – Recommender Systems and the Netflix Prize

Matrix Factorisation / Spotify · Spotify Improvements for the Matrix Factorization Model Netflix Prize Competition 2. Recommender Systems 3. Content Filtering Create a profile for

The Netflix Prize and Production Machine Learning Systems ...pzs.dstu.dp.ua/DataMining/recom/bibl/86975_92959v00_Netflix_Whitepaper.pdfmachine learning algorithms in specialized tools,

The Pragmatic Theory solution to the Netflix Grand Prize · 1 The Pragmatic Theory solution to the Netflix Grand Prize Martin Piotte Martin Chabbert August 2009 Pragmatic Theory Inc.,

CS 277: The Netflix Prize - University of California, …jon/stat-151b-spring-2012/netflix... · CS 277: The Netflix Prize. Professor Padhraic Smyth. Department of Computer Science

Lessons from the Netflix prize - cse.iitk.ac.in prize.pdf · The road to the Netflix Prize •SVD •SVD++ •Temporal SVD •Temporal SVD++ •Highly recommended reading: Madrigal

CS 277: The Netflix Prize Professor Padhraic Smyth Department of Computer Science University of California, Irvine

7 [Read-Only]...6 Second Project • Implement collaborative filtering algorithm • Apply to (subset of) Netflix Prize data – 1821 movies, 28,978 users, 3.25 million ratings (*

Ensemble Learning Featuring the Netflix Prize Competition and