Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Matching Users and Items Across Domains
to Improve the Recommendation Quality
Chung-Yi Li, Shou-De Lin
Department of Computer Science
and Information Engineering,
National Taiwan University
1
Motivation
Lack of data is a serious concern in building a
recommender system, in particular for newly
established services.
Can we leverage the information from other
domains to improve the quality of a recommender
system?
2
2
Problem Definition
Given: Two homogeneous rating matrices
They model the same type of preference.
Decent portion of overlap in users and in items.
𝐑1
Target Rating Matrix
♫ ♫ ♫
𝐑2
Source Rating Matrix
♫ ♫ ♫
Challenge:
The mapping of users is unknown,
and so is the mapping of items.
Goals:
1. Identify the user mapping and
item mapping.
2. Use the identified mappings to
boost the recommendation
performance.
3
3
Why This Problem Is Challenging
When item correspondence is known, the problem is
much easier
Define user similarity. If the similarity is large, they are
likely to be the same users. [Narayanan 2008]
In our case, both sides are unknown
no clear solution yet
𝑹1
♫ ♫ ♫
𝑹2
♫ ♫ ♫
4
4
Basic Idea
low rank assumption and factorization models
5
𝟏 ? 𝟑 ?? 𝟑 ? 𝟓𝟒 ? 𝟔 ?? 𝟏 ? 𝟏𝟗 ? 𝟗 ?
R1
𝟗 ? 𝟗 ?? 𝟏 ? 𝟏𝟕 ? 𝟓 ?? 𝟒 ? 𝟐𝟒 ? 𝟐 ?
R2
1 2 3 41 1 1 1
×
1 01 11 30 10 9
n1 n2 n3 n4m1m2m3m4m5
0 90 11 31 11 0
×4 3 2 11 1 1 1
n4 n3 n2 n1
m5m4m3m2m1
= =1 2 3 41 1 1 1
×
1 01 11 30 10 9
n1 n2 n3 n4m1m2m3m4m5
−18 −9−2 −1−9 −5−5 −3−3 −2
×2 1 0 −1−5 −3 −1 1
n4 n3 n2 n1
m5m4m3m2m1
?
?
5
Solve G in
𝐑𝟏 and 𝐑𝟐: rating matrices (partially observed)
Guser and Gitem: correspondence matrices
1. Latent Space Matching 𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇
𝐑𝒊 (full): low-rank approximation of Ri
Less accurate
2. Matching Refinement 𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇
More accurate, but harder to solve
𝐑1 𝐑2≈ 𝐆user 𝐆item𝑇
M1×N1 M2×N2M1×M2 N2×N1
A Two-Stage Model to Find the Matching
? O
O ?
O ?
? O
1. Latent Space Matching
2. Matching Refinement
Rough Matching Result
Final Matching Result
6
6
We want to solve G from
Obstacle: 𝐑1 and 𝐑2 are not sparse hard to compute/store
Solution: Represent 𝐑𝑖 using user and item latent factors
Next challenge: the latent factor representation must be unique
Regular matrix factorization is not applicable.
Solution: Singular Value Decomposition
Singular values are invariant under permutation.
Stage 1: Latent Space Matching
1. Latent Space Matching
7
7
How can we perform SVD on a
Partially Observed Matrix?
In MF, we solve
Thus, 𝐑 = 𝐏𝐐𝑇
In SVD, we want 𝐑 = 𝐔𝐃𝐕𝑇
From P, Q to U, D, V
This transformation operation can be done efficiently
1. Latent Space Matching
𝐏 𝐐𝑇
𝐔 𝐕𝑇𝐃
= 𝐃𝑃 𝐕𝑃𝑇 𝐕𝑄 𝐃𝑄
𝑇𝐔𝑷 𝐔𝐐𝑇( )
= 𝐔𝑷 𝐔𝐐𝑇𝐔x 𝐃x 𝐕X
T=( )( )
8
8
We want to solve G from
Now we know how to get
Thus
Since SVD is unique, we can separate user and item sides:
𝐑1 = 𝐔1𝐃1𝐕1𝑇 and 𝐑2 = 𝐔2𝐃2𝐕2
𝑇
Matching in Latent Space
Same
subproblemS: sign matrix
(K by K, diagonal, -1 or 1)
1. Latent Space Matching
9
𝐔1𝐃1𝐕1𝑇 ≈ 𝐆user𝐔2𝐃2𝐕2
T𝐆item
𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇
9
Solving
𝐙1 𝐙2≈ 𝐆user
(M1× K) (M1× M2) (M2× K)
1. When 𝐒 is given, to solve 𝐆: nearest neighbor search• only enforce row constraints on G.
2. To Solve S: Greedy Search
• Iteratively try Skk
𝑺 (sign matrix):K by K, diagonal, +1 or -1
1. Latent Space Matching
0 1 0
10
10
Matching Refinement:
𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇
More accurate but harder to solve.
Obtain good initialization and reduced
search space from latent space matching.
Solve Guser and Gitem alternatingly.
The objective value always decreases
& converges.
𝐑2
1. Latent Space Matching
2. Matching Refinement
Rough Matching Result
Final Matching Result
11
11
Goals
1. Identify the user mapping and item mapping
2. Then, use the identified mappings to boost
recommendation performance
1. Latent Space Matching
2. Matching Refinement
Rough Matching Result
Final Matching Result
12
12
Matched latent factors are constrained to be similar
Transferring Imperfect Matching to
Predict Ratings13
13
Experiment Setup
Disjoint Split Overlap Split Contained Split Subset Split
training set of R1training set of R2
Partial Split
users
items
• Yahoo! Music Dataset
14
14
Accuracy and Mean Average Precision: The higher the better15
Rating Prediction (Root Mean Square Error)
RMSE: the lower the better
16
16
(root mean square error)
Conclusion
It is possible to identify user or item
correspondence unsupervisedly based on
homogeneous rating data
Even with imperfect matching, out model can still
improve the recommendation accuracy.
Questions?
18
17