Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Recommendation Systems
Alpa Jain Twitter Revenue
09.10.012
Outline • What are recommendation systems?
• How to build a recommendation system?
• What challenges do we need to further address?
• Twitter promoted products – a real-life recommendation system.
A recommendation system provides information or items that are likely to be of interest to a user, in an automated fashion.
Daily examples of recommendation systems
Snapshot from: h.p://itunes.apple.com/us/app/ne6lix/id363590051?mt=8
Promoted tweets in timeline
Promoted tweets in search results
Why do we need recommendation systems?
Need for Recommendation Systems
• Solution to large amounts of good data
• Reduce cognitive load on users
• Introduce quality control
Overview of Recommendation Systems
Candidate generation
Filtering
Rank
User feedback
Users Items
Automa'cally iden'fy items of interest to users (focus of talk)
Filters: near duplicates, already seen, dismissed
Order recommenda'ons: temporal, diversity, personaliza'on
Track user feedback: dislike, click, purchase
Recommendation Algorithms • Collaborative filtering (CF) • Hypothesis: Similar users tend to like similar items • Two forms of CF – Item-based collaborative filtering – User-based collaborative filtering
• Data collection methods – Explicit feedback
Example: ratings, dismiss – Implicit feedback
Example: number of views, purchases
Data Representation
• Items: i1, i2, i3, … in
• User u1 has provided ratings on items 1 -‐ 5 -‐ 2
I1 I2 I3 I4 I5
U1
4 2 -‐ -‐ 5 U2
1 2 4 4 5 U3
-‐ 3 -‐ 1 5 Um
…
User-Item Ratings Matrix
1 3 1 5 2
1
4 1 2 3 5
1 1 1
1
2 3 2 1 2 1
2 3 4 1
2 4
3
2 3 1 1
users
item
s
Example of User / Movie Ratings Matrix
Alice Bob Charlie Dave
Harry PoHer … 3 5 2 3
The Shawshank Redemp'on
4 4 2 -‐
Rabbit-‐Proof Fence
5 1 -‐ -‐
American Pie -‐ 1 1 5
• The Netflix challenge – 20 thousand movies – 500 thousand users – 100 million ratings
• Training set of 99 million ratings – RMSE < 0.8572 on a test set of 1 million ratings
A Naïve Recommendation System
1. Aggregate ratings for each item 2. Recommend item with maximum rating • Does everybody like Harry Potter movies?
score(i,u) = f (i) = rating(u, i)u∍users∑
Historical information about users is important!
Item-based Collaborative Filtering • Predict user’s rating for an item i based on his
ratings for other item • Given a user u with I(u) preferred items
score(i,u) = rating(u, j)•sim(i, j)j∍I (u)∑
Similarity between items i and j
Rating provided by user u for item j
Example: Item-based CF
• Given user with ratings for items x and y
• Items a and b with similarities
score(u, N) = 1. 0 * 0.8 + 0.3 * 0.3 = 0.89 score(u, S) = 0.2 * 0.8 + 0.3 * 0.8 = 0.4
Harry PoHer The Matrix
ra'ng 0.8 0.3
item Harry PoHer The Matrix
The Chronicles of Narnia (N) 1.0 0.3
Star Wars (S) 0.2 0.8
Computing Similarity between Items
• Cosine similarity – Items are represented as u-dimensional vectors over user space – Similarity is cosine of the angle between two vectors – Score ranges between 1 (perfect) and -1 (opposite)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
B
A C
u1 u2
A 0.8 0.45
B 0.4 0.8
C 0.3 0.3
Example: 2 users
U1
U2
Similarity Measures – contd.
• Cosine similarity :
• Magnitude-aware measure:
• Jaccard, Pearson correlation …
sim(i, j) = I ⋅ J|| I || ⋅ || J ||
sim(i, j) = U(i)∩U( j)|U(i) | ⋅ |U( j) |
Harry Potter movie problem?
User-based Collaborative Filtering • K-nearest neighbors (KNN) – Cluster users after representing them as feature vectors
Users Clusters Items
Challenges in Building Recommender
Systems
Challenges and Interesting Problems • Data sparsity
– Users rarely purchase, rate, or click
• The more you see the less you know – Increasing users or items increases the dimensions we need to learn
• Cold-start problem – No historical information for new users or items
• Correlation between nearest neighbors – Harry Potter sequels
• Scalability and recommendation accuracy not production-friendly
Dimensionality Reduction • Say every user who likes “Harry Potter” also likes
“The Chronicles of Narnia”
• Generalize movies into generic latent semantic characteristics – {fantasy, novel-based movies, …}
• Reduces dimensions to track and improves scalability
• Reduces data sparsity and improves prediction accuracy
Singular Value Decomposition
Single value decomposition takes a (m x n) matrix and produces three matrices: S : a (m x n) diagonal matrix with non-negative numbers U : a (m x m) matrix V : a (n x n) matrix
M =U ⋅S ⋅VT
User Movie Rating Matrix
Alice Bob Charlie Dave Ed
A 5 2 1 1 4
B 0 1 1 0 3
C 3 4 1 0 2
D 4 0 0 0 3
E 3 0 2 5 4
F 2 5 1 5 1
Matrix has 6 rows and 5 columns. Denoted as M(6 x 5) matrix
Computing SVD
Matrix Decomposition
m x n
m x m
m x n n x n
= . .
Apply SVD – example contd.
• SVD collapses the matrix to a smaller matrix retaining important features
• Pick k dimensions and chop off the matrixes
U
-‐0.59 0.37
-‐0.17 0.13
-‐0.36 0.016
-‐0.31 0.51
-‐0.5 -‐0.03
-‐0.47 -‐0.75
VT
-‐0.59 0.40
-‐0.39 0.49
-‐0.20 0.53
-‐0.42 0.62
-‐0.53 0.44
S
12.65 0
0 5.77
We have reduced our dataset to a 2-dimensional space!
Alice
Ed
Finding Recommendations
• A user X comes in with some ratings in the original feature space : [0, 0, 2, 0, 4, 1]
• Map it to k-dimensional vector
• X à [-0.25, -0.15] • Note this is similar to the problem we solved
earlier using cosine similarity
B = BT ⋅U ⋅S−1
SVD : Putting it All Together
• Represent datasets from multiple users and items into a matrix
• Apply SVD and pick k dimensions to reduce our datasets
• Map new users into this low k-dimensional space
• Compute similarity between users in this space
• Provide recommendations based on similar users
Computing SVD - Partial
8 0−2 1
"
#$
%
&'
8 −20 1
"
#$
%
&'A = AT = 64 0
0 5
!
"#
$
%&AT. A=
64− c 00 5− c
"
#$
%
&'AT. A – cI =
|AT. A – cI| = = (64 –c) * (5 – c) – 0 = = 0
Solution: C1= 8 C2= 1 S = 8 0
0 1
!
"
##
$
%
&&= 2.82 0
0 8
!
"#
$
%&
8 * 8 + 0 * 0 = 64
hHp://abeau'fulwww.com/wp-‐content/uploads/2007/04/ne]lixAllMovies-‐blackBack3[5].jpg
Cold start problem : unknown users
The Cold Start Problem
A brand new user or item is introduced in a recommendation system for which it cannot draw any inferences due to lack of historical data
• Ratings or other metrics may contain biases – Harry Potter I is more popular than Harry Potter 2 – Ads or documents often suffer from this problem – First sight, fatigue, seasonality, …
Explore – Exploit Strategy
• Recommend items at random to a small randomly chosen users
• Ensures user information on each item • Full randomization may not be possible
• Recommender systems need to chose between – Exploiting a model to improve quality – Exploring a new item to reduce uncertainty
• Active research area
Explore / Exploit Schemes for Web Content Optimization, ICDM 2009
Twitter Promoted Products
Promoted Accounts
• Suggest accounts that you do not follow and may find interesting • Advertisers increase their followers’ reach
Promoted Tweets in Timeline or Search
Users may expand, retweet, click, favorite, dismiss, …
Further Reading 1. Edwin’s blog: http://blog.echen.me/2011/10/24/winning-the-netflix-prize-a-summary/
2. Netflix challenge: http://www.technologyreview.com/news/406637/the-1-million-netflix-challenge/
3. Collaborative Filtering: A Tutorial - Carnegie Mellon University
4. Content-Boosted Collaborative Filtering for Improved Recommendations
5. Supervised Random Walks: Predicting and Recommending Links in Social Networks, WSDM11
6. The Link Prediction Problem for Social Network, CIKM 2003
7. Explore/Exploit Schemes for Web Content Optimization , ICDM 2009
8. Multi-armed bandit problem: http://en.wikipedia.org/wiki/Multi-armed_bandit
• Ques'ons?