Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Recommender Systems Collabora2ve Filtering and Matrix Factoriza2on
Narges Razavian
Thanks to lecture slides from Alex Smola@CMU Yahuda Koren@Yahoo labs and Bing Liu@UIC
We Know What You Ought To Be Watching This Summer
Amazon.com
score movie user
1 21 1
5 213 1
4 345 2
4 123 2
3 768 2
5 76 3
4 45 4
1 568 5
2 342 5
2 234 5
5 76 6
4 56 6
score movie user
? 62 1
? 96 1
? 7 2
? 3 2
? 47 3
? 15 3
? 41 4
? 28 4
? 93 5
? 74 5
? 69 6
? 83 6
Training data Test data
An example
Two basic approaches
• Content-‐based recommenda2ons: • The user will be recommended items based on profile informa2on or similar to the ones the user preferred in the past;
• Collabora2ve filtering (or collabora2ve recommenda2ons): • The user will be recommended items that people with similar tastes and preferences liked in the past.
• Hybrids: Combine collabora2ve and content-‐based methods.
5
Road Map
• Introduc2on • Content-‐based recommenda@on
• Collabora2ve filtering based recommenda2on – K-‐nearest neighbor – Matrix factoriza2on
6
7
Content-‐Based Recommenda2on
• Recommend items that matches the User Profile.
• The Profile is based on items user has liked in the past or explicit interests that he defines.
• A content-‐based recommender system matches the profile of the item to the user profile to decide on its relevancy to the user.
Road Map
• Introduc2on • Content-‐based recommenda2on
• Collabora2ve filtering based recommenda2ons – K-‐nearest neighbor – Matrix factoriza2on
8
Collabora2ve Filtering Idea
A 9 B 3 C : : Z 5
A B C 9 : : Z 10
A 5 B 3 C : : Z 7
A B C 8 : : Z
A 6 B 4 C : : Z
A 10 B 4 C 8 . . Z 1
User Database
Correla2on Match
A 9 B 3 C : : Z 5
A 10 B 4 C 8 . . Z 1
Collabora2ve filtering
• Collabora2ve filtering (CF): most widely-‐used recommenda2on approach in prac2ce. • k-‐nearest neighbor, • matrix factoriza2on
• Key characteris2c of CF: it predicts the u2lity of items for a user based on the items previously rated by other like-‐minded users.
10
k-‐Nearest Neighbor
• kNN : – u2lizes the en2re user-‐item database to generate predic2ons directly, i.e., there is no model building.
• This approach includes both • User-‐based methods • Item-‐based methods
• Two primary phases: • the neighborhood forma2on phase and • the recommenda2on phase.
11
Neighborhood forma2on phase
• The similarity between the target user, u, and a neighbor, v, can be calculated using the Pearson’s correla@on coefficient:
• ru,i is the ra2ng given to item I by user u. C is the list of items rated by BOTH users, u and v
12
Recommenda2on Phase
• Then we can compute the ra2ng predic2on of item i for target user u
where V is the set of k similar users(could be all users), rv,i is the ra2ng of user v given to item i,
13
Issue with the user-‐based kNN CF
• Lack of scalability: • it requires the real-‐2me comparison of the target user to all user records in order to generate predic2ons.
• Any sugges2ons to improve this?
• A varia2on of this approach that remedies this problem is called item-‐based CF.
14
Item-‐based CF
• The item-‐based approach works by comparing items based on their pacern of ra2ngs across users. The similarity of items i and j is computed as follows:
15
Recommenda2on phase
• Ader compu2ng the similarity between items we select a set of k most similar items to the target item and generate a predicted value of user u’s ra2ng
where J is the set of k similar items
16
Prac2cal Issues : Cold Start
• New user – Rate some ini2al items – Non-‐personalized recommenda2ons – Describe tastes – Demographic info.
• New Item – Non-‐CF : content analysis, metadata
Road Map
• Introduc2on • Content-‐based recommenda2on
• Collabora2ve filtering based recommenda2ons – K-‐nearest neighbor – Matrix factoriza@on
18
Geared towards females
Geared towards males
serious
escapist
The Princess Diaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
Amadeus The Color Purple
Dumb and Dumber
Ocean’s 11
Sense and Sensibility
Gus
Dave
Latent factor models
Latent factor models
45531
312445
53432142
24542
522434
42331
items
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
~
~
items
users
users
Es2mate unknown ra2ngs as inner-‐products of factors:
45531
312445
53432142
24542
522434
42331
items
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
~
~
items
users
users
?
Es2mate unknown ra2ngs as inner-‐products of factors:
45531
312445
53432142
24542
522434
42331
items
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
~
~
items
users
users
?
Es2mate unknown ra2ngs as inner-‐products of factors:
45531
312445
53432142
24542
522434
42331
items
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
~
~
items
users
2.4
users
Challenges • Similar to SVD, but less constrained:
– Factorize with missing values!
• Re-‐define objec2ve func2on:
• Can use gradient descent to deal with missing values
To avoid over-‐filng
Stochas2c Gradient Descent • For each data point,
• Deriva2ves on variables (q and p) are used for update:
• Both p and q are unknown, so we have to alternate – Will converge to local op2ma
Incorpora2ng bias • Some users rate movies higher than others • Some movies get hyped and get higher ra2ngs
• The new model:
• The new objec2ve func2on
• Deriva2ves:
Further modeling assump2ons
• Changing preferences over 2me?
• Varying confidence levels in ra2ngs?
• Other ideas?
Summary
• Recommenda2on based on – Content – Collabora2ve filtering
• Collabora2ve filtering – Neighborhood method – Matrix Factoriza2on
• Possible Further topics – Hybrid models of content and collabora2ve to impute missing values and deal with cold start