Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Recommender systemsand
the Netflix prize
Charles Elkan
January 14, 2011
Solving th
e World
's Problems C
reatively
Recommender systems
We Know What You OughtTo Be Watching This Summer
“We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules
1. Goal: Improve the Netflix recommendation algorithm, Cinematch 2. Criterion: Reduction in error (RMSE)3. Oct ’06: Contest start 4. Oct ’07: $50K progress prize for 8.43% improvement5. Oct ’08: $50K progress prize for 9.44% improvement6. Sept ’09: $1 million grand prize for 10.06% improvement
score movie user1 21 15 213 14 345 24 123 23 768 25 76 34 45 41 568 52 342 52 234 55 76 64 56 6
movie user? 62 1? 96 1? 7 2? 3 2? 47 3? 15 3? 41 4? 28 4? 93 5? 74 5? 69 6? 83 6
Training data Test data
Movie rating data
• Training data– 100 million ratings– 480,000 users– 17,770 movies– 6 years of data:
2000-2005• Test data
– Last few ratings from each user (2.8 million)
• Dates of ratings are given
What is RMSE?
• RMSE stands for “root mean squared error.”
• Let p be the prediction for user u and movie m; let r be the true rating.
• RMSE = √ ∑ (p - r)² / n
• RMSE measures the average mistake, with higher penalty for big mistakes, i.e. large values of (p-r)².
• You can’t have a contest without a precise goal!
#ratings per user
1. Avg #ratings/user: 208
Most Active Users
User ID # Ratings Mean Rating305344 17,651 1.90387418 17,432 1.81
2439493 16,560 1.221664010 15,811 4.262118461 14,829 4.081461435 9,820 1.371639792 9,764 1.331314869 9,739 2.95
The dataset contains 17,770 movies!
#ratings per movie
1. Avg #ratings/movie: 5627
Movies Rated Most Often
Title # Ratings Mean RatingMiss Congeniality 227,715 3.36Independence Day 216,233 3.72The Patriot 200,490 3.78The Day After Tomorrow 194,695 3.44Pretty Woman 190,320 3.90Pirates of the Caribbean 188,849 4.15The Green Mile 180,883 4.31Forrest Gump 180,736 4.30
Important RMSE levels
Prize’07 (BellKor): 0.8712
Cinematch system: 0.9514
Movie average: 1.0533
User average: 1.0651
Global average: 1.1296
Inherent noise: ????
Personalization
erroneous
accurate
Prize’08 (BellKor+BigChaos): 0.8616
Grand Prize (BellKor’s Pragmatic Chaos) : 0.8554
Major Challenges1. Size of data
– Need memory management and efficiency of algorithms
2. Training and test data are different– Test ratings are later in time
3. 99% of data are missing– Eliminates many standard methods
4. Countless factors affect ratings:– Genre, movie vs. TV vs. other– Style of action, dialogue, plot, music– Director, actors
movie #16322
Types of recommender systems
1.Personalized recommendations of items (e.g. Amazon products) to users
3.Content-based:– Pre-specified attributes measured for items– Users’ interests estimated for same attributes– Examples: eHarmony, Pandora
4.Collaborative filtering (CF):– Does not require content information about items
or user surveys – Infers relationships from purchases or ratings– Nearest neighbor methods– Hidden attribute methods
Geared towards females
Geared towards males
Serious
Escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11Sense and Sensibility
Gus
Dave
Hidden attribute methods
Lessons from the Netflix contest
Movie # 13043
Lesson #1: Look at the data
1. Major steps forward were based on including new aspects of the data:– Time-based effects– Selection bias:
• Which movies a user rates is predictive of rating values• Daily rating counts are predictive
2. Use human intelligence to define new features of the data.
Multiple sources of temporal dynamics
• Item-based effects:– Product perception and popularity change constantly– Seasonal patterns influence popularity
• User-based effects:– Customers continually change their tastes– Transient, short-term bias; anchoring– Drifting rating scale– Change of rater within household
Something happened in 2004…
2004
Are old movies better than new ones?
• Matrix factorization is the leading approach– Gradient-descent-based optimization– Integration of biases – Incorporation of implicit feedback– Accounting for temporal effects– Combination with a neighborhood model–
Lesson #3: Mathematics helps!
Matrix factorization method
4 5 5 3 13 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
items
-1 -0.4 0.10.5 0.6 -0.5
0.5 0.3 -0.2
0.3 2.1 0.2
-2 2.1 -0.7
0.3 0.7 1.1
-0.9 2.4 1.4 0.3 -0.4 0.8 -0.5 -2 0.5 0.3 -0.2 1.11.3 -0.1 1.2 -0.7 2.9 1.4 -1 0.3 1.4 0.5 0.7 -0.8
0.1 -0.6 0.7 0.8 0.4 -0.3 0.9 2.4 1.7 0.6 -0.4 2.1
=
items
users
users
This is a rank-3 linear algebra approximation!
Estimate unknown ratings as dot-products of factor values:
4 5 5 3 13 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
items
0.2 -0.4 0.10.5 0.6 -0.5
0.5 0.3 -0.2
0.3 2.1 1.1
-2 2.1 -0.7
0.3 0.7 -1
-0.9 2.4 1.4 0.3 -0.4 0.8 -0.5 -2 0.5 0.3 -0.2 1.11.3 -0.1 1.2 -0.7 2.9 1.4 -1 0.3 1.4 0.5 0.7 -0.8
0.1 -0.6 0.7 0.8 0.4 -0.3 0.9 2.4 1.7 0.6 -0.4 2.1
=
items
users
users
?
Estimate unknown ratings as inner-products of factors:
4 5 5 3 13 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
items
0.2 -0.4 0.10.5 0.6 -0.5
0.5 0.3 -0.2
0.3 2.1 1.1
-2 2.1 -0.7
0.3 0.7 -1
-0.9 2.4 1.4 0.3 -0.4 0.8 -0.5 4 0.5 0.3 -0.2 1.11.3 -0.1 1.2 -0.7 2.9 1.4 -1 0.5 1.4 0.5 0.7 -0.8
0.1 -0.6 0.7 0.8 0.4 -0.3 0.9 1.4 1.7 0.6 -0.4 2.1
=
~
items
users
users
?
Estimate unknown ratings as inner-products of factors:
4 5 5 3 13 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
items
0.2 -0.4 0.10.5 0.6 -0.5
0.5 0.3 -0.2
0.3 2.1 1.1
-2 2.1 -0.7
0.3 0.7 -1
-0.9 2.4 1.4 0.3 -0.4 0.8 -0.5 4 0.5 0.3 -0.2 1.11.3 -0.1 1.2 -0.7 2.9 1.4 -1 0.5 1.4 0.5 0.7 -0.8
0.1 -0.6 0.7 0.8 0.4 -0.3 0.9 1.4 1.7 0.6 -0.4 2.1
~
~
items
users
1.6
0.5*4 + 0.6*0.5 + (-0.5)*1.4 = 1.6
users
Matrix factorization model4 5 5 3 1
3 1 2 4 4 55 3 4 3 2 1 4 22 4 5 4 2
5 2 2 4 3 44 2 3 3 1
0.2 -0.4 0.10.5 0.6 -0.50.5 0.3 -0.20.3 2.1 1.1-2 2.1 -0.70.3 0.7 -1
-0.9 2.4 1.4 0.3 -0.4 0.8 -0.5 -2 0.5 0.3 -0.2 1.11.3 -0.1 1.2 -0.7 2.9 1.4 -1 0.3 1.4 0.5 0.7 -0.8
0.1 -0.6 0.7 0.8 0.4 -0.3 0.9 2.4 1.7 0.6 -0.4 2.1=
Why can’t we use standard linear algebra?
1. Standard linear algebra only applies to matrices where every entry has a known value.
3. Smoothing is necessary: We must learn as much signal as possible where there are sufficient data, but not overfit where data are scarce.
Matrix approximation and the Netflix contest
• Probably the most popular contest method – Powerful, fast, and easy to program– Simon Funk described first gradient-descent SVD method.– Immediately ranked 3rd place on leaderboard– Still today: many related discussions in the Prize forum
Monday, December 11, 2006Netflix Update: Try This at Home
Simon Funk is the pseudonym of
Brandyn Webb, UCSD CSE B.S.
alumnus.
Ideas needed to win the Netflix prize
1. Matrix factorization (see your linear algebra class)2. RMSE cost function (see your statistics class)3. Gradient descent (see your calculus class)4. Stochastic gradient descent (machine learning)5. Regularization (machine learning)6. Baseline factors7. A different target: which movies does a user rate?8. Time-dependent factor values
28
Conclusion
Learn to be computer scientists, then go out and change the world!
Solving th
e World
's Problems C
reatively