36
Yong Zheng, Mayur Agnani, Mili Singh School of Applied Technology Illinois Institute of Technology Chicago, IL, 60616, USA Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Embed Size (px)

Citation preview

Page 1: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Yong Zheng, Mayur Agnani, Mili Singh

School of Applied Technology

Illinois Institute of TechnologyChicago, IL, 60616, USA

Identifying Grey Sheep Users By The Distribution of UserSimilarities In Collaborative Filtering

Page 2: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

2

Page 3: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

3

Page 4: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Information Overload

4

Page 5: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Alleviating Information Overload

5

Information Extraction and Filtering

Query

Information Retrieval

Recommender Systems

Page 6: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Recommender System (RS)

• RS: item recommendations tailored to user tastes

6

Page 7: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Recommender Systems

7

E-Commerce: Amazon.com, eBay.com, BestBuy, NewEgg

Page 8: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Recommender Systems

8

Online Streaming: Netflix, Pandora, Spotify, Youtube, etc

Page 9: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Recommender Systems

9

Social Media: Facebook, Twitter, Weibo, etc

Page 10: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

How it works

10

Red

Mars

Juras-

sic

Park

Lost

World

2001

Found

ation

Differ-

ence

Engine

Recommender

Systems

User

Profile

Neuro-

mancer2010

Recommendations

Page 11: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Traditional Recommendation Algorithms

11

Content-Based Recommendation AlgorithmsThe user will be recommended items similar to the ones the user preferred in the past, such as book/movie recsys

Collaborative Filtering Based Recommendation AlgorithmsThe user will be recommended items that people with similar tastes and preferences liked in the past, e.g., movie recsys

Hybrid Recommendation AlgorithmsCombine content-based and collaborative filtering based algorithms to produce item recommendations.

Page 12: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Collaborative Filtering

1212

Page 13: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Collaborative Filtering: Algorithms

13

User-Based KNN Collaborative Filtering (UBCF)Assumption: a user u’s rating on item t is similar to other users’ rating on item t, while this group of similar users is called user K-nearest neighbor

Pirates of the Caribbean 4

Kung Fu Panda 2 Harry Potter 6 Harry Potter 7

U1 4 4 1 2

U2 3 4 2 1

U3 2 2 4 4

U4 4 4 1 ?

Page 14: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Collaborative Filtering: Algorithms

14

User-Based KNN Collaborative Filtering (UBCF)

Pirates of the Caribbean 4

Kung Fu Panda 2 Harry Potter 6 Harry Potter 7

U1 4 4 1 2

U2 3 4 2 1

U3 2 2 4 4

U4 4 4 1 ?

a = the target useri = the target item

N = user neighborhoodu = a user neighbor in N

Page 15: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Collaborative Filtering: Algorithms

15

Popular Challenges in Collaborative Filtering

Data sparsity problems

Cold-start users or items

Grey-sheep users

Incorporate content into collaborative filtering

….

Page 16: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

16

Page 17: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Grey Sheep Users

17

Definition 1 by Mark Claypool, et al., 1999

A group of users who neither agree nor disagree with any group of users. Therefore, they will not benefit from the user-based collaborative filtering technique

Definition 2 by John McCrae, et al., 2004

White Sheep Users: high correlations with other users

Black Sheep Users: very few or no correlating users

Grey Sheep Users: unusual tastes, low correlations with others

Page 18: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Research Problem: Identifying Grey Sheep Users

18

Collaborative Filtering Other Algorithms

Page 19: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Existing Approaches

19

There are two existing Approaches

Clustering Technique by Ghazanfar, et al., 2011

Distribution of User Ratings by Gras, et al., 2016

They were developed based on the 1st definition

We propose a novel approach based on 2nd definition

White Sheep Users: high correlations with other users

Black Sheep Users: very few or no correlating users

Grey Sheep Users: unusual tastes, low correlations with others

Note: Black Sheep User refers to the user whom we do not have enough knowledge for.

Page 20: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

20

Page 21: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Proposed Solution

21

We propose a novel approach based on 2nd definition

White Sheep Users: high correlations with other users

Black Sheep Users: very few or no correlating users

Grey Sheep Users: unusual tastes, low correlations with others

The Distribution of User-User Correlations or Similarities

Page 22: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Proposed Solution

22

Proposed Solution

Step 1, represent each user as distribution of user correlations

Step 2, select good and bad examples

Step 3, apply outlier detection on selected examples. Grey sheep users are the intersections of bad examples and identified outliers

Step4, examine the quality of identified grey sheep users

Page 23: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Proposed Solution

23

Step 1, Distribution Representations

We calculate user-user correlations by cosine similarity

Obtain the descriptive statistics of the distribution

Page 24: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Proposed Solution

24

Step 2, Example Selection

Good examples: high correlations and left-skewed

Bad examples: low correlations and right-skewed

Page 25: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Proposed Solution

25

Step 3, Outlier Detection by Local Outlier Factor (LOF)

LOF helps identify outliers by the local density

Observations with LOF > 1 will be considered as outliers

We set different threshold values to findthe optimal one for identifying grey sheep users, for example

LOF threshold = 1.0LOF threshold = 1.1LOF threshold = 1.2LOF threshold = ….

Page 26: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Proposed Solution

26

Step 4, Examine the quality of identified GS Users

The parameters in our solution

Example Selection

LOF threshold

Neighbor of neighborhood in LOF method

Our goals or examination criteria

To find as many GS users as possible

Recommendation by UBCF should be worse for GS users than non-GS users

Page 27: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

27

Page 28: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Experimental Setting

• Data: MovieLens 10 million rating data

– 10M ratings

– 72K users

– 10K movies

– Each user has rated at least 20 movies

• Evaluation

– 80% as training, 20% as testing

– Mean absolute error, MAE, to eval rating predictions

28

Page 29: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Results

• Impact by the # of neighbors in LOF method

29

Page 30: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Results

• Comparison of Recommendation Quality

30

Page 31: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Results

• Visualization of GS and Non-GS users

31

Page 32: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

32

Page 33: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Conclusions

• We develop a novel approach to identify GS users by utilizing the definition related to the user-user correlations

• Our approach can successfully identify GS users

• Our approach is less complicated than the existing approaches

33

Page 34: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Drawbacks and Future Work

• We did not compare our solution with the two existing methods

• The user-user correlations may not be reliable if the rating data is sparse

34

Page 35: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Stay Tuned

• Yong Zheng, Mayur Agnani, Mili Singh. “Identification of Grey Sheep Users By Histogram Intersection In Recommender Systems”. Proceedings of The 13th International Conference on Advanced Data Mining and Applications, 2017

– We improve the proposed solution in the RIIT paper

– We better measure user-user correlations

– We compare our solution with the two existing methods and demonstrate the advantages and effectiveness of our proposed solution

35

Page 36: [RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

Yong Zheng, Mayur Agnani, Mili Singh

School of Applied Technology

Illinois Institute of TechnologyChicago, IL, 60616, USA

Identifying Grey Sheep Users By The Distribution of UserSimilarities In Collaborative Filtering