Recommender systems and the Netflix prizecseweb.ucsd.edu/classes/wi12/cse91-a/Lectures/...Grand Prize (BellKor’s Pragmatic Chaos) : 0.8554 Major Challenges 1. Size of data – Need

Recommender systemsand

the Netflix prize

Charles Elkan

January 14, 2011

Solving th

e World

's Problems C

reatively

Recommender systems

We Know What You OughtTo Be Watching This Summer

“We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules

1. Goal: Improve the Netflix recommendation algorithm, Cinematch 2. Criterion: Reduction in error (RMSE)3. Oct ’06: Contest start 4. Oct ’07: $50K progress prize for 8.43% improvement5. Oct ’08: $50K progress prize for 9.44% improvement6. Sept ’09: $1 million grand prize for 10.06% improvement

score movie user1 21 15 213 14 345 24 123 23 768 25 76 34 45 41 568 52 342 52 234 55 76 64 56 6

movie user? 62 1? 96 1? 7 2? 3 2? 47 3? 15 3? 41 4? 28 4? 93 5? 74 5? 69 6? 83 6

Training data Test data

Movie rating data

• Training data– 100 million ratings– 480,000 users– 17,770 movies– 6 years of data:

2000-2005• Test data

– Last few ratings from each user (2.8 million)

• Dates of ratings are given

What is RMSE?

• RMSE stands for “root mean squared error.”

• Let p be the prediction for user u and movie m; let r be the true rating.

• RMSE = √ ∑ (p - r)² / n

• RMSE measures the average mistake, with higher penalty for big mistakes, i.e. large values of (p-r)².

• You can’t have a contest without a precise goal!

#ratings per user

1. Avg #ratings/user: 208

Most Active Users

User ID # Ratings Mean Rating305344 17,651 1.90387418 17,432 1.81

2439493 16,560 1.221664010 15,811 4.262118461 14,829 4.081461435 9,820 1.371639792 9,764 1.331314869 9,739 2.95

The dataset contains 17,770 movies!

#ratings per movie

1. Avg #ratings/movie: 5627

Movies Rated Most Often

Title # Ratings Mean RatingMiss Congeniality 227,715 3.36Independence Day 216,233 3.72The Patriot 200,490 3.78The Day After Tomorrow 194,695 3.44Pretty Woman 190,320 3.90Pirates of the Caribbean 188,849 4.15The Green Mile 180,883 4.31Forrest Gump 180,736 4.30

Important RMSE levels

Prize’07 (BellKor): 0.8712

Cinematch system: 0.9514

Movie average: 1.0533

User average: 1.0651

Global average: 1.1296

Inherent noise: ????

Personalization

erroneous

accurate

Prize’08 (BellKor+BigChaos): 0.8616

Grand Prize (BellKor’s Pragmatic Chaos) : 0.8554

Major Challenges1. Size of data

– Need memory management and efficiency of algorithms

2. Training and test data are different– Test ratings are later in time

3. 99% of data are missing– Eliminates many standard methods

4. Countless factors affect ratings:– Genre, movie vs. TV vs. other– Style of action, dialogue, plot, music– Director, actors

movie #16322

Types of recommender systems

1.Personalized recommendations of items (e.g. Amazon products) to users

3.Content-based:– Pre-specified attributes measured for items– Users’ interests estimated for same attributes– Examples: eHarmony, Pandora

4.Collaborative filtering (CF):– Does not require content information about items

or user surveys – Infers relationships from purchases or ratings– Nearest neighbor methods– Hidden attribute methods

Geared towards females

Geared towards males

Serious

Escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus

Dave

Hidden attribute methods

Lessons from the Netflix contest

Movie # 13043

Lesson #1: Look at the data

1. Major steps forward were based on including new aspects of the data:– Time-based effects– Selection bias:

• Which movies a user rates is predictive of rating values• Daily rating counts are predictive

2. Use human intelligence to define new features of the data.

Multiple sources of temporal dynamics

• Item-based effects:– Product perception and popularity change constantly– Seasonal patterns influence popularity

• User-based effects:– Customers continually change their tastes– Transient, short-term bias; anchoring– Drifting rating scale– Change of rater within household

Something happened in 2004…

2004

Are old movies better than new ones?

• Matrix factorization is the leading approach– Gradient-descent-based optimization– Integration of biases – Incorporation of implicit feedback– Accounting for temporal effects– Combination with a neighborhood model–

Lesson #3: Mathematics helps!

Matrix factorization method

4 5 5 3 13 1 2 4 4 5

5 3 4 3 2 1 4 2

2 4 5 4 2

5 2 2 4 3 4

4 2 3 3 1

items

-1 -0.4 0.10.5 0.6 -0.5

0.5 0.3 -0.2

0.3 2.1 0.2

-2 2.1 -0.7

0.3 0.7 1.1

-0.9 2.4 1.4 0.3 -0.4 0.8 -0.5 -2 0.5 0.3 -0.2 1.11.3 -0.1 1.2 -0.7 2.9 1.4 -1 0.3 1.4 0.5 0.7 -0.8

0.1 -0.6 0.7 0.8 0.4 -0.3 0.9 2.4 1.7 0.6 -0.4 2.1

=

items

users

users

This is a rank-3 linear algebra approximation!

Estimate unknown ratings as dot-products of factor values:

4 5 5 3 13 1 2 4 4 5

5 3 4 3 2 1 4 2

2 4 5 4 2

5 2 2 4 3 4

4 2 3 3 1

items

0.2 -0.4 0.10.5 0.6 -0.5

0.5 0.3 -0.2

0.3 2.1 1.1

-2 2.1 -0.7

0.3 0.7 -1

-0.9 2.4 1.4 0.3 -0.4 0.8 -0.5 -2 0.5 0.3 -0.2 1.11.3 -0.1 1.2 -0.7 2.9 1.4 -1 0.3 1.4 0.5 0.7 -0.8

0.1 -0.6 0.7 0.8 0.4 -0.3 0.9 2.4 1.7 0.6 -0.4 2.1

=

items

users

users

?

Estimate unknown ratings as inner-products of factors:

4 5 5 3 13 1 2 4 4 5

5 3 4 3 2 1 4 2

2 4 5 4 2

5 2 2 4 3 4

4 2 3 3 1

items

0.2 -0.4 0.10.5 0.6 -0.5

0.5 0.3 -0.2

0.3 2.1 1.1

-2 2.1 -0.7

0.3 0.7 -1

-0.9 2.4 1.4 0.3 -0.4 0.8 -0.5 4 0.5 0.3 -0.2 1.11.3 -0.1 1.2 -0.7 2.9 1.4 -1 0.5 1.4 0.5 0.7 -0.8

0.1 -0.6 0.7 0.8 0.4 -0.3 0.9 1.4 1.7 0.6 -0.4 2.1

=

~

items

users

users

?

Estimate unknown ratings as inner-products of factors:

4 5 5 3 13 1 2 4 4 5

5 3 4 3 2 1 4 2

2 4 5 4 2

5 2 2 4 3 4

4 2 3 3 1

items

0.2 -0.4 0.10.5 0.6 -0.5

0.5 0.3 -0.2

0.3 2.1 1.1

-2 2.1 -0.7

0.3 0.7 -1

-0.9 2.4 1.4 0.3 -0.4 0.8 -0.5 4 0.5 0.3 -0.2 1.11.3 -0.1 1.2 -0.7 2.9 1.4 -1 0.5 1.4 0.5 0.7 -0.8

0.1 -0.6 0.7 0.8 0.4 -0.3 0.9 1.4 1.7 0.6 -0.4 2.1

~

~

items

users

1.6

0.5*4 + 0.6*0.5 + (-0.5)*1.4 = 1.6

users

Matrix factorization model4 5 5 3 1

3 1 2 4 4 55 3 4 3 2 1 4 22 4 5 4 2

5 2 2 4 3 44 2 3 3 1

0.2 -0.4 0.10.5 0.6 -0.50.5 0.3 -0.20.3 2.1 1.1-2 2.1 -0.70.3 0.7 -1

-0.9 2.4 1.4 0.3 -0.4 0.8 -0.5 -2 0.5 0.3 -0.2 1.11.3 -0.1 1.2 -0.7 2.9 1.4 -1 0.3 1.4 0.5 0.7 -0.8

0.1 -0.6 0.7 0.8 0.4 -0.3 0.9 2.4 1.7 0.6 -0.4 2.1=

Why can’t we use standard linear algebra?

1. Standard linear algebra only applies to matrices where every entry has a known value.

3. Smoothing is necessary: We must learn as much signal as possible where there are sufficient data, but not overfit where data are scarce.

Matrix approximation and the Netflix contest

• Probably the most popular contest method – Powerful, fast, and easy to program– Simon Funk described first gradient-descent SVD method.– Immediately ranked 3rd place on leaderboard– Still today: many related discussions in the Prize forum

Monday, December 11, 2006Netflix Update: Try This at Home

Simon Funk is the pseudonym of

Brandyn Webb, UCSD CSE B.S.

alumnus.

http://sifter.org/~simon/journal/ims/20061211/cam.20061212.140354.5_1024x768.jpg

http://sifter.org/~simon/journal/ims/20061211/cam.20061212.140354.5_1024x768.jpg

Ideas needed to win the Netflix prize

1. Matrix factorization (see your linear algebra class)2. RMSE cost function (see your statistics class)3. Gradient descent (see your calculus class)4. Stochastic gradient descent (machine learning)5. Regularization (machine learning)6. Baseline factors7. A different target: which movies does a user rate?8. Time-dependent factor values

28

Conclusion

Learn to be computer scientists, then go out and change the world!

Solving th

e World

's Problems C

reatively

Documents

Recommender systems and the Netflix prizecseweb.ucsd.edu/classes/wi12/cse91-a/Lectures/...Grand Prize (BellKor’s Pragmatic Chaos) : 0.8554 Major Challenges 1. Size of data – Need