View
35
Download
0
Category
Tags:
Preview:
DESCRIPTION
The BellKor 2008 Solution to the Netflix Prize. by Leenarat Leelapanyalert. Netflix Dataset. Over 100 million movie ratings with date-stamp (100,480,507 ratings) M = 17,770 movies N = 480,189 customers 1 (star) = no interest, 5(stars) = strong interest Dec 31, 1999 – Dec 31, 2005 - PowerPoint PPT Presentation
Citation preview
The BellKor 2008 Solution to the Netflix Prize
byLeenarat Leelapanyalert
Netflix Dataset
• Over 100 million movie ratings with date-stamp (100,480,507 ratings)
• M = 17,770 movies• N = 480,189 customers• 1 (star) = no interest, 5(stars) = strong interest• Dec 31, 1999 – Dec 31, 2005
The user-item matrixN*M = 8,532,958,530 elements
98.9% values are missing
Netflix Competition
• 4.2 from 100 million ratings– Training set (Probe set)– Qualifying set (Quiz set & Test set)
• Scoring– Show RMSE achieved on the Quiz set– Best RMSE on the Test set → THE WINNER!!
Outline
• Necessary index letters• Baseline predictors– With temporal effects
• Latent factor models– with temporal effects
• Neighborhood models– with temporal effects
• Integrated models• Extra: Shrinking towards recent actions
Outline• Necessary index letters
• Baseline predictors →– With temporal effects
• Latent factor models– with temporal effects
• Neighborhood models– with temporal effects
• Integrated models• Extra: Shrinking towards recent actions
Adjust deviations of each user (rater, customer) and item (movie)
Outline• Necessary index letters• Baseline predictors– With temporal effects
• Latent factor models →– with temporal effects
• Neighborhood models– with temporal effects
• Integrated models• Extra: Shrinking towards recent actions
Compare between items and usersby SVD
Outline• Necessary index letters• Baseline predictors– with temporal effects
• Latent factor models– with temporal effects
• Neighborhood models→– with temporal effects
• Integrated models• Extra: Shrinking towards recent actions
Compute the relationship between items(or users)
Outline• Necessary index letters• Baseline predictors
- with temporal effects • Latent factor models
–with temporal effects • Neighborhood models
–with temporal effects• Integrated models• Extra: Other methods
– Shrinking towards recent actions– Blending multiple solutions
Outline• Necessary index letters• Baseline predictors– with temporal effects
• Latent factor models– with temporal effects
• Neighborhood models– with temporal effects
• Integrated models →• Extra:Shrinking towards recent actions
Combine Latent factor models and Neighborhood models together
Outline• Necessary index letters• Baseline predictors– with temporal effects
• Latent factor models– with temporal effects
• Neighborhood models– with temporal effects
• Integrated models
• Extra: Shrinking towards recent actions → New ideas
Index Letters
• u,v → users, raters, or customers• i,j → movies, or items• rui → the score by user u of movie i• rui → predicted value of rui
• tui → the time of rating rui
• K → the training set which rui is known• R(u) → all the items for which rating by u• R(i) → the set of users who rated item i• N(u) → all items that can estimated u’s score
^
Baseline Predictors (bui)
µ → the overall average ratingbu → deviations of user ubi → deviation of item i
Example: µ = 3.7, Simha(bu) = -0.3,Titanic (bi) = 0.5 bui = 3.7 – 0.3 + 0.5 = 3.9 stars
iuui bbb
Estimate Parameter (bu, bi) – Formula
The regularization parameters (𝜆1,𝜆2) are determined by validation on the Probe set. In this case: 𝜆1 = 25, 𝜆2 = 10
iuui bbb
)()(
1
)(
iRr
b iRui
)()(
2
)(
uRbr
b iuRiu
Estimate Parameter (bu, bi) – The Least Squares Problem
iuui bbb
i
iKiu u
uiuuibbbbbr )()(min 2
),(
21
2
*
Estimate Parameter (bu, bi) – The Least Squares Problem
iuui bbb
i
iKiu u
uiuuibbbbbr )()(min 2
),(
21
2
*
to fit the given rating to avoid overfitting by penalizing the magnitudes of the parameters
Time Change VS Baseline Predictors
• An item’s popularity may change over time• Users change their baseline rating over time
)()( uiiuiuui tbtbb
iuui bbb
bi(tui)
• We do not expect movie likeability to fluctuate on a daily basis
• Time periods → Bins• 30 bins
)(,)( tBiniii bbtb
)()( uiiuiuui tbtbb
bu(tui)
• Unlike movies, user effects can change on a daily basis
• Time deviation
tu → the mean date of rating by tut → the date that user u rated the movieβ = 0.4 by validation on the Probe set
uuu ttttsigntdev )()(
bu(tui)
• Suit well with gradual drifts
)()()1( tdevbtb uuuu
uuu ttttsigntdev )()(
)()( uiiuiuui tbtbb
bu(tui)
• How about sudden drifts?– Since we found that multiple ratings that a user
gives in a single day
• A user rates on 40 different days on average• Thus, but requires about 40 parameters per user
utuuuu btdevbtb )()()3(
Baseline Predictors
)(,,)()(uiui tBiniituuiuuuui bbbtdevbtb
Baseline Predictors
)(,,)()(uiui tBiniituuiuuuui bbbtdevbtb
Bu (user bias) Bi (movie bias)
Baseline Predictors
• Movie bias is not completely user-independent
cu(t) → time-dependent scaling featurecu → (stable part)cut → (day-specific variable)
)(,,)()(uiui tBiniituuiuuuui bbbtdevbtb
Bu (user bias) Bi (movie bias)
)()()()( )(,, uiutBiniituuiuuuui tcbbbtdevbtbuiui
utuu cctc )(
RMSE = 0.9555)()()()( )(,, uiutBiniituuiuuuui tcbbbtdevbtb
uiui
Frequencies (additional)
• The number of ratings a user gave on a specific daySIGNIFICANT
Fui → the overall number of ratings that user u gave on day tui bif → the bias specific for the item i at log-frequency fRMSE 0.9555 → 0.9278
uiaui Ff log
uiuiui fiuiutBiniituuiuuuui btcbbbtdevbtb ,)(,, )()()()(
Why Frequencies Work?
• Bad when using with user-movie interaction terms• Nothing when using with user-related parameters• Rate a lot in a bulk → Not closely to the actual watching day– Positive approach– Negative approach
• High frequencies (or bulk ratings) do not represent much change in people’s taste, but mostly biased selection of movies
Predicting Future Days
• The day-specific parameters should be set to default value
• cu(tui) = cu
• bu,t = 0
• The transient temporal model doesn’t attempt to capture future changes.
Latent Factor Models
• To transform both items and users to the same latent factor space– Obvious dimensions• Comedy VS Drama• Amount of action• Orientation to children
– Less well defined dimensions• Depth of character development
• Tool → SVD
Singular Value Decomposition (SVD)
• Factoring matrices into a series of linear approximations that expose the underlying structure of the matrix
Singular Value Decomposition (SVD)
A B C
Simha 4 4 4
Ateeq 5 5 5
Smith 3 3 3
Greg 4 4 4
Mcq 4 4 4
Ramin 4 4 4
Xiao 4 4 4
Wu 3 3 3
Riz 5 5 5
Predicted Score = User Baseline Rating * Movie Average Score
4
5
3
4
4
4
4
3
5
1 1 1= *
Singular Value Decomposition (SVD)
A B C
Simha 4 4 5
Ateeq 4 5 5
Smith 3 3 2
Greg 4 5 4
Mcq 4 4 4
Ramin 3 5 4
Xiao 4 4 3
Wu 2 4 4
Riz 5 5 5
Predicted Score = User Baseline Rating * Movie Average Score
Singular Value Decomposition (SVD)
A B C
Simha 3.95 4.64 4.34
Ateeq 4.27 5.02 4.69
Smith 2.42 2.85 2.66
Greg 3.97 4.67 4.36
Mcq 3.64 4.28 4.00
Ramin 3.69 4.33 4.05
Xiao 3.33 3.92 3.66
Wu 3.08 3.63 3.39
Riz 4.55 5.35 5.00
Predicted Score = User Baseline Rating * Movie Average Score
4.34
4.69
2.66
4.36
4.00
4.05
3.66
3.39
5.00
0.91 1.07 1.00= *
Singular Value Decomposition (SVD)
A B C
Simha 3.95 4.64 4.34
Ateeq 4.27 5.02 4.69
Smith 2.42 2.85 2.66
Greg 3.97 4.67 4.36
Mcq 3.64 4.28 4.00
Ramin 3.69 4.33 4.05
Xiao 3.33 3.92 3.66
Wu 3.08 3.63 3.39
Riz 4.55 5.35 5.00
Predicted Score = User Baseline Rating * Movie Average Score
-
A B C
Simha 4 4 5
Ateeq 4 5 5
Smith 3 3 2
Greg 4 5 4
Mcq 4 4 4
Ramin 3 5 4
Xiao 4 4 3
Wu 2 4 4
Riz 5 5 5
Singular Value Decomposition (SVD)
A B C
Simha 0.05 -0.64 0.66
Ateeq -0.28 -0.02 0.31
Smith 0.58 0.15 -0.66
Greg 0.03 0.33 -0.36
Mcq 0.36 -0.28 0.00
Ramin -0.69 0.67 -0.05
Xiao 0.67 0.08 -0.66
Wu -1.08 0.37 0.61
Riz 0.45 -035 0.00
Predicted Score = User Baseline Rating * Movie Average Score
-0.18
-0.38
0.80
0.15
0.35
-0.67
0.89
-1.29
0.44
0.82 -0.20
-0.53= *
Singular Value Decomposition (SVD)
A B C
Simha 4 4 5
Ateeq 4 5 5
Smith 3 3 2
Greg 4 5 4
Mcq 4 4 4
Ramin 3 5 4
Xiao 4 4 3
Wu 2 4 4
Riz 5 5 5
Predicted Score = User Baseline Rating * Movie Average Score
4.34 -0.18 -0.90
4.69 -0.38 -0.15
2.66 0.80 0.40
4.36 0.15 0.47
4.00 0.35 -0.29
4.05 -0.67 0.68
3.66 0.89 0.33
3.39 -1.29 0.14
5.00 0.44 -0.36
0.91 1.07 1.00
0.82 -0.20 -0.53
-0.21 0.76 -0.62= *
Latent Factor Models
pu → user-factors vectorqi → item-factors vector
• Add implicit feedback– Asymmetric-SVD
– SVD++60 factorsRMSE =
0.8966
iTuuiui qpbr ˆ
)(
21
)(ˆuNj
juTiuiui yuNpqbr
)()(
21
21
)()()(ˆuNj
juRj
jujujTiuiui yuNxbruRqbr
Temporal Effects
• Time– Movie biases – go in and out of popularity over time
bi
– User biases – user change their baseline ratings over time bu
– User preferences – genre, perception on actors and directors, household
pu
)(
21
)()()(ˆuNj
juTiuiui yuNtpqtbr
Temporal Effects
• The same way we treat user bias we can also treat the user preferences
k=1,2,…,f
k=1,2,…,f
)(),...,(),()( 21 tptptptp ufuuT
u
)()()1( tdevbtb uuuu
utuuuu btdevbtb )()()3(
)()()1( tdevptp uukukuk
tukukuk ptptp ,)1()3( )()(
RMSE
f = 500RMSE =
0.8815
f = 500RMSE = 0.8841 !!
• Most accurate factor model (add frequencies)
f = 500, RMSE = 0.8784 f = 2000, RMSE = 0.8762
)()()1( tdevptp uukukuk
tukukuk ptptp ,)1()3( )()(
)(
21
)()()(ˆuNj
juTiuiui yuNtpqtbr
)(,
21
)()()()(ˆuNj
juTfi
Tiuiui yuNtpqqtbr
ui
Neighborhood Models
• To compute the relationship between items• Evaluate the score of a user to an item based
on ratings of similar items by the same user
The Similarity Measure
• The Pearson correlation coefficient, ρij
The Similarity Measure
• The Pearson correlation coefficient, ρij;λ2 = 100sij – similaritynij – the number of users that rated both i and j
• A weighted average of the ratings of neighborhood items
Sk(i;u) – the set of k items rated by u, which are most similar to i
ijij
ijdef
ij nn
s 2
);(
);()(
ˆuiSj ij
uiSj ujujijuiui
k
k
s
brsbr
Problem With The Model
• Isolate the relations between 2 items• Fully rely on the neighbors, even if they are absent
• The wij’s are not user specific• Sum over all item rated by u
);(
);()(
ˆuiSj ij
uiSj ujujijuiui
k
k
s
brsbr
)(
)(ˆuRj
ijujujuiui wbrbr
Improving The Model
• Isolate the relations between 2 items• Fully rely on the neighbors, even if they are absent
• The wij’s are not user specific• Sum over all item rated by u
• Not only what he rated, but also what he did not rate.• cij is expected to be high if j is predictive on i
);(
);()(
ˆuiSj ij
uiSj ujujijuiui
k
k
s
brsbr
)(
)(ˆuRj
ijujujuiui wbrbr
)()(
)(ˆuNjij
uRjijujujuiui cwbrbr
Improving The Model
• The current model somewhat overemphasizes the dichotomy between heavy raters and those that rarely rate
• Moderate this behavior by normalization
• 𝛼 = 0 → non-normalized rule – encourages greater deviations• 𝛼 = 1 → fully normalized rule – eliminate the effect of number of rating• In this case, 𝛼 = 0.5RMSE = 0.9002
)()(
)(ˆuNjij
uRjijujujuiui cwbrbr
)()(
21
21
)()()(ˆuNjij
uRjijujujuiui cuNwbruRbr
Improving The Model
RMSE = 0.9002• Reduce the model by pruning parameters
Sk(i) – the set of k items most similar i
k = 17,770 → RMSE = 0.8906k = 2000 → RMSE = 0.9067
)()(
21
21
)()()(ˆuNjij
uRjijujujuiui cuNwbruRbr
);();(
21
21
);()();(ˆuiNjij
k
uiRjijujuj
kuiui
kk
cuiNwbruiRbr
)()();( iSuRuiR kdef
k
)()();( iSuNuiN kdef
k
Integrated Models• Baseline predictors + Factor models + Neighborhood models
f = 170, k = 300 → RMSE = 0.8827• Further improve accuracy, we add a more elaborated temporal model for the user bias
f = 170, k = 300 → RMSE = 0.8786
);();()(
)1()1( 21
21
21
);()();()()()()(ˆuiNjij
k
uiRjijujuj
k
uNjju
Tiiuui
kk
cuiNwbruiRyuNtpqtbtbr
);();()(
)1()3( 21
21
21
);()();()()()()(ˆuiNjij
k
uiRjijujuj
k
uNjju
Tiiuui
kk
cuiNwbruiRyuNtpqtbtbr
EXTRA: Shrinking Towards Recent Actions
• To correct rui
• Shrink rui towards the average rating of u on day t• The single day effect is among the strongest temporal
effects in data α = 8β = 11nut – the number of ratings u gave on day trut – the mean rating of u at day tVut – the variance of u’s ratings at day t
ut
ututui
crcr
ˆ
)exp( ututut Vnc
Shrinking Towards Recent Actions• A stronger corrections accounts for periods longer than a single day• And tries to characterize the recent user behavior on similar movies
ui
uiuiui
crcr
1ˆ
)exp( uiuiut Vnc )exp( ujuiij
uij ttsw
jratedu
uijui wn
__
jratedu
uij
jrateduuj
uij
ui w
rwr
__
__
2
__
__
2
)()(
ui
jratedu
uij
jrateduuj
uij
ui rw
rwV
Q & A
Recommended