18
Matching Users and Items Across Domains to Improve the Recommendation Quality Chung-Yi Li, Shou-De Lin [email protected] [email protected] Department of Computer Science and Information Engineering, National Taiwan University 1

Matching Users and Items Across Domains to Improve the ...r00922051/matching/Matching...Matching Users and Items Across Domains to Improve the Recommendation Quality Chung-Yi Li, Shou-De

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

  • Matching Users and Items Across Domains

    to Improve the Recommendation Quality

    Chung-Yi Li, Shou-De Lin

    [email protected]

    [email protected]

    Department of Computer Science

    and Information Engineering,

    National Taiwan University

    1

  • Motivation

    Lack of data is a serious concern in building a

    recommender system, in particular for newly

    established services.

    Can we leverage the information from other

    domains to improve the quality of a recommender

    system?

    2

    2

  • Problem Definition

    Given: Two homogeneous rating matrices

    They model the same type of preference.

    Decent portion of overlap in users and in items.

    𝐑1

    Target Rating Matrix

    ♫ ♫ ♫

    𝐑2

    Source Rating Matrix

    ♫ ♫ ♫

    Challenge:

    The mapping of users is unknown,

    and so is the mapping of items.

    Goals:

    1. Identify the user mapping and

    item mapping.

    2. Use the identified mappings to

    boost the recommendation

    performance.

    3

    3

  • Why This Problem Is Challenging

    When item correspondence is known, the problem is

    much easier

    Define user similarity. If the similarity is large, they are

    likely to be the same users. [Narayanan 2008]

    In our case, both sides are unknown

    no clear solution yet

    𝑹1

    ♫ ♫ ♫

    𝑹2

    ♫ ♫ ♫

    4

    4

  • Basic Idea

    low rank assumption and factorization models

    5

    𝟏 ? 𝟑 ?? 𝟑 ? 𝟓𝟒 ? 𝟔 ?? 𝟏 ? 𝟏𝟗 ? 𝟗 ?

    R1

    𝟗 ? 𝟗 ?? 𝟏 ? 𝟏𝟕 ? 𝟓 ?? 𝟒 ? 𝟐𝟒 ? 𝟐 ?

    R2

    1 2 3 41 1 1 1

    ×

    1 01 11 30 10 9

    n1 n2 n3 n4m1m2m3m4m5

    0 90 11 31 11 0

    ×4 3 2 11 1 1 1

    n4 n3 n2 n1

    m5m4m3m2m1

    = =1 2 3 41 1 1 1

    ×

    1 01 11 30 10 9

    n1 n2 n3 n4m1m2m3m4m5

    −18 −9−2 −1−9 −5−5 −3−3 −2

    ×2 1 0 −1−5 −3 −1 1

    n4 n3 n2 n1

    m5m4m3m2m1

    ?

    ?

    5

  • Solve G in

    𝐑𝟏 and 𝐑𝟐: rating matrices (partially observed)

    Guser and Gitem: correspondence matrices

    1. Latent Space Matching 𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇

    𝐑𝒊 (full): low-rank approximation of Ri

    Less accurate

    2. Matching Refinement 𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇

    More accurate, but harder to solve

    𝐑1 𝐑2≈ 𝐆user 𝐆item𝑇

    M1×N1 M2×N2M1×M2 N2×N1

    A Two-Stage Model to Find the Matching

    ? O

    O ?

    O ?

    ? O

    1. Latent Space Matching

    2. Matching Refinement

    Rough Matching Result

    Final Matching Result

    6

    6

  • We want to solve G from

    Obstacle: 𝐑1 and 𝐑2 are not sparse hard to compute/store

    Solution: Represent 𝐑𝑖 using user and item latent factors

    Next challenge: the latent factor representation must be unique

    Regular matrix factorization is not applicable.

    Solution: Singular Value Decomposition

    Singular values are invariant under permutation.

    Stage 1: Latent Space Matching

    1. Latent Space Matching

    7

    7

  • How can we perform SVD on a

    Partially Observed Matrix?

    In MF, we solve

    Thus, 𝐑 = 𝐏𝐐𝑇

    In SVD, we want 𝐑 = 𝐔𝐃𝐕𝑇

    From P, Q to U, D, V

    This transformation operation can be done efficiently

    1. Latent Space Matching

    𝐏 𝐐𝑇

    𝐔 𝐕𝑇𝐃

    = 𝐃𝑃 𝐕𝑃𝑇 𝐕𝑄 𝐃𝑄

    𝑇𝐔𝑷 𝐔𝐐𝑇( )

    = 𝐔𝑷 𝐔𝐐𝑇𝐔x 𝐃x 𝐕X

    T=( )( )

    8

    8

  • We want to solve G from

    Now we know how to get

    Thus

    Since SVD is unique, we can separate user and item sides:

    𝐑1 = 𝐔1𝐃1𝐕1𝑇 and 𝐑2 = 𝐔2𝐃2𝐕2

    𝑇

    Matching in Latent Space

    Same

    subproblemS: sign matrix

    (K by K, diagonal, -1 or 1)

    1. Latent Space Matching

    9

    𝐔1𝐃1𝐕1𝑇 ≈ 𝐆user𝐔2𝐃2𝐕2

    T𝐆item

    𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇

    9

  • Solving

    𝐙1 𝐙2≈ 𝐆user

    (M1× K) (M1× M2) (M2× K)

    1. When 𝐒 is given, to solve 𝐆: nearest neighbor search• only enforce row constraints on G.

    2. To Solve S: Greedy Search

    • Iteratively try Skk

    𝑺 (sign matrix):K by K, diagonal, +1 or -1

    1. Latent Space Matching

    0 1 0

    10

    10

  • Matching Refinement:

    𝐑1 ≈ 𝐆user 𝐑2𝐆item𝑇

    More accurate but harder to solve.

    Obtain good initialization and reduced

    search space from latent space matching.

    Solve Guser and Gitem alternatingly.

    The objective value always decreases

    & converges.

    𝐑2

    1. Latent Space Matching

    2. Matching Refinement

    Rough Matching Result

    Final Matching Result

    11

    11

  • Goals

    1. Identify the user mapping and item mapping

    2. Then, use the identified mappings to boost

    recommendation performance

    1. Latent Space Matching

    2. Matching Refinement

    Rough Matching Result

    Final Matching Result

    12

    12

  • Matched latent factors are constrained to be similar

    Transferring Imperfect Matching to

    Predict Ratings13

    13

  • Experiment Setup

    Disjoint Split Overlap Split Contained Split Subset Split

    training set of R1training set of R2

    Partial Split

    users

    items

    • Yahoo! Music Dataset

    14

    14

  • Accuracy and Mean Average Precision: The higher the better15

  • Rating Prediction (Root Mean Square Error)

    RMSE: the lower the better

    16

    16

  • (root mean square error)

  • Conclusion

    It is possible to identify user or item

    correspondence unsupervisedly based on

    homogeneous rating data

    Even with imperfect matching, out model can still

    improve the recommendation accuracy.

    Questions?

    18

    17