Collaborative Filtering with Entity Similarity …ink-ron.usc.edu/xiangren/ijcai13_HINA.pdfenhance...

Preview:

Citation preview

Co l l a bo rat ive F i l te r i ng wi t h Entity Similarity Regularization in Heterogeneous Information Networks

Xiao Yu1, Xiang Ren1*, Quanquan Gu1, Yizhou Sun2, Jiawei Han1

1Univ. of Illinois, at Urbana-Champaign 2Northeastern Univ. *xren7@illinois.edu

1

Roadmap

• Why Study CF in HIN?

• Background and Preliminaries

• Proposed Method

• Experiments

• Conclusion and Future Work

2

Recommender Systems are Everywhere!

3

Recommendation Paradigm

4

user profiles

I1 I2 … Im

U1 ? ? ? 5

U2 ? 3 ? 4

… ? ? ? ?

Un 2 1 ? ?

user item ratings

item features

external knowledge

recommender system recommendation

Recommender System with Network

• Utilizing network relationship information can enhance the recommendation quality

• However, most of the previous studies only use single type of relationship between users or items (e.g., social network [Ma,WSDM11], trust relationship [Ester, KDD10], service membership [Yuan, RecSys11])

5

The Heterogeneous Information Network View of Recommender System

6

Why Information Network Can Help?

• Various types of information and relationships complement each other.

• Number of ratings - power law distribution

• Cold Start – How to handle new users or new items?

7

# of ratings

A very small number of users and items have a lot of ratings

Most users and items do not have enough ratings

nu

ms

of

use

rs

Roadmap

• Why Study CF in HIN?

• Background and Preliminaries

• Proposed Method

• Experiments

• Conclusion and Future Work

8

What Are Information Networks? • A network where each node represents an entity (e.g.,

user in a social network) and each link (e.g., friendship)

a relationship between entities.

– Nodes/links may have attributes, labels, and weights.

– Links may carry rich semantic information.

9

Heterogeneous Information Networks

10

Venue Paper Author

DBLP Bibliographic Network The IMDb Movie Network

Actor

Movie

Director

Movie

Studio

The Facebook Network

1. Multiple entity types and link types 2. New problems are emerging in heterogeneous networks!

Heterogeneous Information Networks Are Ubiquitous

11

Social Media Protein Networks E-commerce

Medical

Database Medical

Images

Medical

Records

Treatment Plan

Pharmacy Service

Healthcare Knowledge Graph

IMDb Network Schema

12

background

Entity Similarity

13

In heterogeneous information networks, find entities which are similar to a given entity query.

In DBLP, who are similar to “C. Faloutsos”?

In IMDb, which TVs / movies are similar to “Avatar”?

In Yelp, which restaurants are similar to “Blackdog”?

background

Meta-Path [Sun, VLDB 2011]

14

A1

A2

P1

P2

VLDB

Social Network

A3

A4

Network Snippet

• Meta-level description of a path between two entities • A path on network schema • Denote an existing or concatenated relation between two

entity types

A1-P1-A2 A1-P1-VLDB-P3-A3 A1-P1-”Social Network”-P2-A4 ……

P3

A1 is similar to A2, A3 and A4 but why?

Author-Paper-Author Author-Paper-Venue-Paper-Author Author-Paper-Term-Paper-Author

background

Similarity Measurement

• PathSim [Sun, VLDB 2011]

• Normalized path count between x and y following meta-path 𝒫

• Entities with strong connectivity and similar visibility under the given meta-path

– Path Constrained Random Walk[Lao, Machine Learning, 2010]

15

Visibility of x Visibility of y

background

Different Meta-Paths Carry Different Semantics

• Who are most similar to C. Faloutsos?

16

Christos’s students or close collaborators Work on similar topics and have similar reputation

Meta-Path: Author-Paper-Author Meta-Path: Author-Paper-Venue-Paper-Author

background

Problem Definition

• Given

• For a specific user, find items of interests based his / her previous rating history.

17

E1 e2 … em

u1 0 0 0 1

u2 0 2 0 5

… 0 0 0 0

un 3 4 0 0

Rating Data Information Network

Roadmap

• Why Study CF in HIN?

• Background and Preliminaries

• Proposed Method

• Experiments

• Conclusion and Future Work

18

Notations

• We have n users and m items.

• By computing similarity scores of all item pairs along certain meta-pat, we can get a similarity matrix.

• With L different meta-paths, we can calculate L similarity matrices as

19

Traditional Matrix Factorization

• Approximate R with product of U and V

• Non-Negative Matrix Factorization

• Weighted Non-Negative matrix Factorization

20

Objective Function

21

Approximate R with U V product Regularization on U V

Regularization on θ Similar items measured from HIN should have similar low-rank representations

Simplify Optimization Process

22

where

Revised Objective Function

23

Similar items measured from HIN should have similar low-rank representations

Parameter Estimation

24

Step 1

Step 2

Step 3

Iteratively updating U, V and θ till convergence

Roadmap

• Why Study CF in HIN?

• Background and Preliminaries

• Proposed Method

• Experiments

• Conclusion and Future Work

25

Dataset • We combine IMDb + MovieLens100K

26

We random sample training datasets of different sizes (0.4, 0.6, and 0.8)

Comparison Methods

27

We use Hete-MF to represent the proposed method.

Evaluation Metrics

• We use Mean Absolute Error and Root Mean Square Error to evaluate the performance.

28

Performance Comparison

29

Performance Analysis

30

Convergence Rate

31

Roadmap

• Why Study CF in HIN?

• Background and Preliminaries

• Proposed Method

• Experiments

• Conclusion and Future Work

32

Conclusions

• We study CF in HIN.

• We combine rating data with meta-path-based similarity matrices.

• We compared the proposed approaches with several widely employed or state-of-the-art recommendation techniques.

• We analyzed the performance of these methods under different scenarios.

33

Future Work

• Adding user and/or item rating priors to the proposed method to alleviate cold start problem

• Personalized recommendation models

• On-line version of the method to incorporate newly generated ratings

34

Thank You!!

35

Recommended