Matching Users and Items Across Domains to Improve the ...r00922051/matching/Matching...Matching Users and Items Across Domains to Improve the Recommendation Quality Chung-Yi Li, Shou-De

Matching Users and Items Across Domains

to Improve the Recommendation Quality

Chung-Yi Li, Shou-De Lin

[email protected]

[email protected]

Department of Computer Science

and Information Engineering,

National Taiwan University

1

Motivation

Lack of data is a serious concern in building a

recommender system, in particular for newly

established services.

Can we leverage the information from other

domains to improve the quality of a recommender

system?

2

2

Problem Definition

Given: Two homogeneous rating matrices

They model the same type of preference.

Decent portion of overlap in users and in items.

𝐑1

Target Rating Matrix

♫ ♫ ♫

𝐑2

Source Rating Matrix

♫ ♫ ♫

Challenge:

The mapping of users is unknown,

and so is the mapping of items.

Goals:

1. Identify the user mapping and

item mapping.

2. Use the identified mappings to

boost the recommendation

performance.

3

3

Why This Problem Is Challenging

When item correspondence is known, the problem is

much easier

Define user similarity. If the similarity is large, they are

likely to be the same users. [Narayanan 2008]

In our case, both sides are unknown

no clear solution yet

𝑹1

♫ ♫ ♫

𝑹2

♫ ♫ ♫

4

4

Basic Idea

low rank assumption and factorization models

5

𝟏 ? 𝟑 ?? 𝟑 ? 𝟓𝟒 ? 𝟔 ?? 𝟏 ? 𝟏𝟗 ? 𝟗 ?

R1

𝟗 ? 𝟗 ?? 𝟏 ? 𝟏𝟕 ? 𝟓 ?? 𝟒 ? 𝟐𝟒 ? 𝟐 ?

R2

1 2 3 41 1 1 1

×

1 01 11 30 10 9

n1 n2 n3 n4m1m2m3m4m5

0 90 11 31 11 0

×4 3 2 11 1 1 1

n4 n3 n2 n1

m5m4m3m2m1

= =1 2 3 41 1 1 1

×

1 01 11 30 10 9

n1 n2 n3 n4m1m2m3m4m5

−18 −9−2 −1−9 −5−5 −3−3 −2

×2 1 0 −1−5 −3 −1 1

n4 n3 n2 n1

m5m4m3m2m1

?

?

5

Solve G in

𝐑𝟏 and 𝐑𝟐: rating matrices (partially observed)

Guser and Gitem: correspondence matrices

1. Latent Space Matching 𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇

𝐑𝒊 (full): low-rank approximation of Ri

Less accurate

2. Matching Refinement 𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇

More accurate, but harder to solve

𝐑1 𝐑2≈ 𝐆user 𝐆item𝑇

M1×N1 M2×N2M1×M2 N2×N1

A Two-Stage Model to Find the Matching

? O

O ?

O ?

? O

1. Latent Space Matching

2. Matching Refinement

Rough Matching Result

Final Matching Result

6

6

We want to solve G from

Obstacle: 𝐑1 and 𝐑2 are not sparse hard to compute/store

Solution: Represent 𝐑𝑖 using user and item latent factors

Next challenge: the latent factor representation must be unique

Regular matrix factorization is not applicable.

Solution: Singular Value Decomposition

Singular values are invariant under permutation.

Stage 1: Latent Space Matching


7

7

How can we perform SVD on a

Partially Observed Matrix?

In MF, we solve

Thus, 𝐑 = 𝐏𝐐𝑇

In SVD, we want 𝐑 = 𝐔𝐃𝐕𝑇

From P, Q to U, D, V

This transformation operation can be done efficiently


𝐏 𝐐𝑇

𝐔 𝐕𝑇𝐃

= 𝐃𝑃 𝐕𝑃𝑇 𝐕𝑄 𝐃𝑄

𝑇𝐔𝑷 𝐔𝐐𝑇( )

= 𝐔𝑷 𝐔𝐐𝑇𝐔x 𝐃x 𝐕X

T=( )( )

8

8

We want to solve G from

Now we know how to get

Thus

Since SVD is unique, we can separate user and item sides:

𝐑1 = 𝐔1𝐃1𝐕1𝑇 and 𝐑2 = 𝐔2𝐃2𝐕2

𝑇

Matching in Latent Space

Same

subproblemS: sign matrix

(K by K, diagonal, -1 or 1)


9

𝐔1𝐃1𝐕1𝑇 ≈ 𝐆user𝐔2𝐃2𝐕2

T𝐆item

𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇

9

Solving

𝐙1 𝐙2≈ 𝐆user

(M1× K) (M1× M2) (M2× K)

1. When 𝐒 is given, to solve 𝐆: nearest neighbor search• only enforce row constraints on G.

2. To Solve S: Greedy Search

• Iteratively try Skk

𝑺 (sign matrix):K by K, diagonal, +1 or -1


0 1 0

10

10

Matching Refinement:

𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇

More accurate but harder to solve.

Obtain good initialization and reduced

search space from latent space matching.

Solve Guser and Gitem alternatingly.

The objective value always decreases

& converges.

𝐑2





11

11

Goals

1. Identify the user mapping and item mapping

2. Then, use the identified mappings to boost

recommendation performance





12

12

Matched latent factors are constrained to be similar

Transferring Imperfect Matching to

Predict Ratings13

13

Experiment Setup

Disjoint Split Overlap Split Contained Split Subset Split

training set of R1training set of R2

Partial Split

users

items

• Yahoo! Music Dataset

14

14

Accuracy and Mean Average Precision: The higher the better15

Rating Prediction (Root Mean Square Error)

RMSE: the lower the better

16

16

(root mean square error)

Conclusion

It is possible to identify user or item

correspondence unsupervisedly based on

homogeneous rating data

Even with imperfect matching, out model can still

improve the recommendation accuracy.

Questions?

18

17

Documents

Matching Users and Items Across Domains to Improve the ...r00922051/matching/Matching...Matching Users and Items Across Domains to Improve the Recommendation Quality Chung-Yi Li, Shou-De