29
Making Collaborative Filtering Work for Recommendation Engines Patrick Roos [email protected]

Collaborative Filtering For Recommendation Engines

Embed Size (px)

Citation preview

Page 1: Collaborative Filtering For Recommendation Engines

Making Collaborative Filtering Work for

Recommendation EnginesPatrick Roos

[email protected]

Page 2: Collaborative Filtering For Recommendation Engines

About Patrick• B.S. in “Discovery Informatics”, first graduate in 2007• Ph.D. in Computer Science from UMD (College Park)• National Cancer Institute (NIH) • Miner & Kasch • Worked on data science projects

across a broad variety of fields

Page 3: Collaborative Filtering For Recommendation Engines

What we’ll talk about• What is collaborative filtering?• What does it do for us?• How do we do it? • What is good/ not so good about it?• How can we modify it to address shortcomings?

Page 4: Collaborative Filtering For Recommendation Engines

What’s Collaborative Filtering?• Method of making automatic, personalized recommendations• “Collaborative Filtering” because recommendations are derived from

collaborative input from other users• Has had success in a variety of web-based markets, including

Amazon, iTunes, Netflix, and LastFM• Two main types:

• Item-Based Collaborative Filtering• User-Based Collaborative Filtering

Page 5: Collaborative Filtering For Recommendation Engines

Item-Based Collaborative Filtering• Provides recommendations

for a particular item• Based on item’s similarity

to other items • Similarity defined through

the users who preferred theitems or not

“People who liked this item also liked this other item”

Page 6: Collaborative Filtering For Recommendation Engines

Item-Based Collaborative Filtering • Item-based in essence just chains recommendations off of a particular

item• Item-based can recommend items to users who we know nothing

about except for what item they may be looking at currently

Page 7: Collaborative Filtering For Recommendation Engines

User-Based Collaborative Filtering• Provides recommendations

for particular user• Based on user’s similarity to

other users • Similarity defined through

the items users preferredor not

“People who like a lot of the same stuff you like also like this other stuff”

Page 8: Collaborative Filtering For Recommendation Engines

User-Based Collaborative Filtering• Takes the overall preferences of a user into account• Can be more personalized and diverse

Page 9: Collaborative Filtering For Recommendation Engines

What do we need for Collaborative Filtering? • Records of user ratings on items• Rating might just be purchased (1) or not (0) or some other indication

column type

user ID int

Item ID int

rating int

ratings (required) useful other things (optional)

• Time of rating• Tags of info on items• Categories of items• Newness of item• …

Page 10: Collaborative Filtering For Recommendation Engines

User-Based Collaborative Filtering1. Create user-to-item vectors2. Calculate user similarity (cosine similarity)3. Compute recommendation scores with weighted averages

Page 11: Collaborative Filtering For Recommendation Engines

1. Create user-to-item vectors# read ratings into a DataFrame from DB table query = "select * from ratings" ratings_df = pandas.read_sql(query, con=engine)

# pivot to get item bit vectors for each user user_item_vectors = ratings_df.pivot_table(index='user_id', columns='item_id', values='rating', fill_value = 0)

User-Item Vectors

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9

User A 1 0 0 1 0 1 1 1 1

User B 1 0 0 1 0 0 0 1 1

User C 0 0 0 0 0 0 1 0 0

User D 0 0 1 1 1 0 0 0 0

Page 12: Collaborative Filtering For Recommendation Engines

2. Calculate user similarities• Cosine similarity measures the cosine of the angle between two

vectors

User-Item Vectors

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9

User A 1 0 0 1 0 1 1 1 1

User B 1 0 0 1 0 0 0 1 1

User C 0 0 0 0 0 0 1 0 0

User D 0 0 1 1 1 0 0 0 0

Page 13: Collaborative Filtering For Recommendation Engines

2. Calculate user similaritiesUser-Item Vectors

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9

User A 1 0 0 1 0 1 1 1 1

User B 1 0 0 1 0 0 0 1 1

User C 0 0 0 0 0 0 1 0 0

User D 0 0 1 1 1 0 0 0 0

User Similarity Matrix

User A User B User C User D

User A

User B … …

User C … … …

User D … … …

Page 14: Collaborative Filtering For Recommendation Engines

2. Calculate user similarities# base similarity matrix (all dot products)sim = numpy.dot(user_item_vectors, user_item_vectors.T)

# squared magnitude (total rating) of vectorssquare_mag = numpy.diag(sim)

# inverse squared magnitude inv_square_mag = 1 / square_mag inv_square_mag[numpy.isinf(inv_square_mag)] = 0

# inverse of the magnitudeinv_mag = numpy.sqrt(inv_square_mag)

# cosine similarity (elementwise multiply by inverse magnitudes)cosine = sim * inv_mag cosine = cosine.T * inv_mag

Page 15: Collaborative Filtering For Recommendation Engines

2. Calculate user similaritiesUser-Item Vectors

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9

User A 1 0 0 1 0 1 1 1 1

User B 1 0 0 1 0 0 0 1 1

User C 0 0 0 0 0 0 1 0 0

User D 0 0 1 1 1 0 0 0 0

User Similarity Matrix

User A User B User C User D

User A 1 0.816497 0.408248 0.235702

User B 0.816497 1 0 0.288675

User C 0.408248 0 1 0

User D 0.235702 0.288675 0 1

Page 16: Collaborative Filtering For Recommendation Engines

3. Compute recommendation scores• Take top K most similar users• Score all items by similarity-weighted sum of ratingsExample: User B, K =2

User-Item Vectors

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9

User A 1 0 0 1 0 1 1 1 1

User B 1 0 0 1 0 0 0 1 1

User C 0 0 0 0 0 0 1 0 0

User D 0 0 1 1 1 0 0 0 0

User Similarity Matrix

User A User B User C User D

User A 1 0.816497 0.408248 0.235702

User B 0.816497 1 0 0.288675

User C 0.408248 0 1 0

User D 0.235702 0.288675 0 1

For each user for who we want recommendations:

Page 17: Collaborative Filtering For Recommendation Engines

3. Compute recommendation scores• Take top K most similar users• Score all items by similarity-weighted sum of ratingsExample: User B, K =2

For each user for who we want recommendations:

User Similarity Matrix

User A User B User C User D

User A 1 0.816497 0.408248 0.235702

User B 0.816497 1 0 0.288675

User C 0.408248 0 1 0

User D 0.235702 0.288675 0 1

User-Item Vectors

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9

User A 0.816497 0 0 0.816497 0 0.816497 0.816497 0.816497 0.816497

User B 1 0 0 1 0 0 0 1 1

User C 0 0 0 0 0 0 1 0 0

User D 0 0 0.288675 0.288675 0.288675 0 0 0 0

Page 18: Collaborative Filtering For Recommendation Engines

3. Compute recommendation scores• Take top K most similar users• Score all items by similarity-weighted average or sum of ratingsExample: User B, K =2

User-Item Vectors

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9

User A 0.816497 0 0 0.816497 0 0.816497 0.816497 0.816497 0.816497

User B 1 0 0 1 0 0 0 1 1

User C 0 0 0 0 0 0 1 0 0

User D 0 0 0.288675 0.288675 0.288675 0 0 0 0

User Similarity Matrix

User A User B User C User D

User A 1 0.816497 0.408248 0.235702

User B 0.816497 1 0 0.288675

User C 0.408248 0 1 0

User D 0.235702 0.288675 0 1

For each user for who we want recommendations:

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9

0.40824 0 0.1443 0.5526 0.1443 0.4082 0.4082 0.4082 0.4082

Page 19: Collaborative Filtering For Recommendation Engines

Last step• Remove items user B already has, leaves us with final

recommendations:

Collaborative Filtering Item Scores for User B with K = 2Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9

0.408248 0 0.144338 0.552586 0.144338 0.408248 0.408248 0.408248 0.408248

Item 2 Item 3 Item 5 Item 6 Item 7

0 0.144338 0.144338 0.408248 0.408248

Page 20: Collaborative Filtering For Recommendation Engines

Item-Based Collaborative Filtering1. Create user-to-item vectors2. Calculate item similarity (cosine similarity)3. Compute recommendation scores with weighted averages

All we need to do is transpose the user-item vectors matrix!!

End up with an item-to-item similarity matrix

User-Item Vectors Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9

User A 1 0 0 1 0 1 1 1 1

User B 1 0 0 1 0 0 0 1 1

User C 0 0 0 0 0 0 1 0 0

User D 0 0 1 1 1 0 0 0 0

Change: sim = numpy.dot(user_item_vectors, user_item_vectors.T)To: sim = numpy.dot(user_item_vectors.T, user_item_vectors)

Page 21: Collaborative Filtering For Recommendation Engines

Notes• All the math works out just the same when using ratings instead of

just 0’s and 1’s• Similarity matrices and recommendations can all be computed in

batch and stored/cached for quick recall• There is a big memory vs. speed tradeoff in method chosen to

compute similarity matrices from user-item-bit vectors• Vectorized operations (very fast, but lots of memory on big data sets)• Looping (don’t need much memory, but quite slow without parallelization)

Page 22: Collaborative Filtering For Recommendation Engines

Strengths of Collaborative Filtering• Content-Agnostic

• Does not require items or users to be tagged with content information

• Recommendations are very personalized because they’re based on like-minded people

• Adaptive to the user base• Can pick up on fitting recommendations that would be difficult or

impossible to identify with a content-based method• Inherently adaptive to preference changes over time

Page 23: Collaborative Filtering For Recommendation Engines

Weaknesses of Collaborative Filtering• Cold start problem

• Need user ratings observed to provide recommendation

• Computationally expensive over huge amounts of data• Lack of “heterophilious diffusion”

• Not good at meeting the potential desire of users to be recommended items from users that are not like them

• Can have a strong bias towards popular, older items

Page 24: Collaborative Filtering For Recommendation Engines

Modifying Collaborative Filtering• Can address some of the possible drawbacks of collaborative filtering

through score modifications• Weigh base scores by

content-based info:• Age of item• Category of item• Popularity of item• Etc.

Collaborative Filtering

Base pool of recommendations with scores

Score Modifications

Final Recommendation Set

Page 25: Collaborative Filtering For Recommendation Engines

Modifying Collaborative Filtering• Vary responsiveness to user’s

recent vs. old tastes• Multiply ratings by an

age decay function reflecting how age should affectrating importance

Page 26: Collaborative Filtering For Recommendation Engines

Hybrid Recommender System Approach• Combine collaborative filtering with content and model-based

methods• Create recommendations by both methods and combined• Content-based methods are complementary to collaborative filtering

• Use clustering, classification, and content-based grouping to limit the population for which to compute similarities

Page 27: Collaborative Filtering For Recommendation Engines

Practical Advantages of Item-Based• Many businesses have more customers than items

• Item-item similarity matrix much smaller than user-user

• Item similarity scores tend to converge over time

Page 28: Collaborative Filtering For Recommendation Engines

Summary• Collaborative filtering can make personalized recommendations based

on ‘social’ recommendations • All we need to do collaborative filtering is ratings of users on items• Implementation is relatively straight-forward (caveat of a significant

speed/memory tradeoff)• Great variety of creative modifications and ways to use it are possible

to “make it work” for a specific use case

Page 29: Collaborative Filtering For Recommendation Engines

Making Collaborative Filtering Work for

Recommendation EnginesPatrick Roos

[email protected]