[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative...

Preview:

Citation preview

Yong Zheng, Mayur Agnani, Mili Singh

School of Applied Technology

Illinois Institute of TechnologyChicago, IL, 60616, USA

Identifying Grey Sheep Users By The Distribution of UserSimilarities In Collaborative Filtering

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

2

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

3

Information Overload

4

Alleviating Information Overload

5

Information Extraction and Filtering

Query

Information Retrieval

Recommender Systems

Recommender System (RS)

• RS: item recommendations tailored to user tastes

6

Recommender Systems

7

E-Commerce: Amazon.com, eBay.com, BestBuy, NewEgg

Recommender Systems

8

Online Streaming: Netflix, Pandora, Spotify, Youtube, etc

Recommender Systems

9

Social Media: Facebook, Twitter, Weibo, etc

How it works

10

Red

Mars

Juras-

sic

Park

Lost

World

2001

Found

ation

Differ-

ence

Engine

Recommender

Systems

User

Profile

Neuro-

mancer2010

Recommendations

Traditional Recommendation Algorithms

11

Content-Based Recommendation AlgorithmsThe user will be recommended items similar to the ones the user preferred in the past, such as book/movie recsys

Collaborative Filtering Based Recommendation AlgorithmsThe user will be recommended items that people with similar tastes and preferences liked in the past, e.g., movie recsys

Hybrid Recommendation AlgorithmsCombine content-based and collaborative filtering based algorithms to produce item recommendations.

Collaborative Filtering

1212

Collaborative Filtering: Algorithms

13

User-Based KNN Collaborative Filtering (UBCF)Assumption: a user u’s rating on item t is similar to other users’ rating on item t, while this group of similar users is called user K-nearest neighbor

Pirates of the Caribbean 4

Kung Fu Panda 2 Harry Potter 6 Harry Potter 7

U1 4 4 1 2

U2 3 4 2 1

U3 2 2 4 4

U4 4 4 1 ?

Collaborative Filtering: Algorithms

14

User-Based KNN Collaborative Filtering (UBCF)

Pirates of the Caribbean 4

Kung Fu Panda 2 Harry Potter 6 Harry Potter 7

U1 4 4 1 2

U2 3 4 2 1

U3 2 2 4 4

U4 4 4 1 ?

a = the target useri = the target item

N = user neighborhoodu = a user neighbor in N

Collaborative Filtering: Algorithms

15

Popular Challenges in Collaborative Filtering

Data sparsity problems

Cold-start users or items

Grey-sheep users

Incorporate content into collaborative filtering

….

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

16

Grey Sheep Users

17

Definition 1 by Mark Claypool, et al., 1999

A group of users who neither agree nor disagree with any group of users. Therefore, they will not benefit from the user-based collaborative filtering technique

Definition 2 by John McCrae, et al., 2004

White Sheep Users: high correlations with other users

Black Sheep Users: very few or no correlating users

Grey Sheep Users: unusual tastes, low correlations with others

Research Problem: Identifying Grey Sheep Users

18

Collaborative Filtering Other Algorithms

Existing Approaches

19

There are two existing Approaches

Clustering Technique by Ghazanfar, et al., 2011

Distribution of User Ratings by Gras, et al., 2016

They were developed based on the 1st definition

We propose a novel approach based on 2nd definition

White Sheep Users: high correlations with other users

Black Sheep Users: very few or no correlating users

Grey Sheep Users: unusual tastes, low correlations with others

Note: Black Sheep User refers to the user whom we do not have enough knowledge for.

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

20

Proposed Solution

21

We propose a novel approach based on 2nd definition

White Sheep Users: high correlations with other users

Black Sheep Users: very few or no correlating users

Grey Sheep Users: unusual tastes, low correlations with others

The Distribution of User-User Correlations or Similarities

Proposed Solution

22

Proposed Solution

Step 1, represent each user as distribution of user correlations

Step 2, select good and bad examples

Step 3, apply outlier detection on selected examples. Grey sheep users are the intersections of bad examples and identified outliers

Step4, examine the quality of identified grey sheep users

Proposed Solution

23

Step 1, Distribution Representations

We calculate user-user correlations by cosine similarity

Obtain the descriptive statistics of the distribution

Proposed Solution

24

Step 2, Example Selection

Good examples: high correlations and left-skewed

Bad examples: low correlations and right-skewed

Proposed Solution

25

Step 3, Outlier Detection by Local Outlier Factor (LOF)

LOF helps identify outliers by the local density

Observations with LOF > 1 will be considered as outliers

We set different threshold values to findthe optimal one for identifying grey sheep users, for example

LOF threshold = 1.0LOF threshold = 1.1LOF threshold = 1.2LOF threshold = ….

Proposed Solution

26

Step 4, Examine the quality of identified GS Users

The parameters in our solution

Example Selection

LOF threshold

Neighbor of neighborhood in LOF method

Our goals or examination criteria

To find as many GS users as possible

Recommendation by UBCF should be worse for GS users than non-GS users

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

27

Experimental Setting

• Data: MovieLens 10 million rating data

– 10M ratings

– 72K users

– 10K movies

– Each user has rated at least 20 movies

• Evaluation

– 80% as training, 20% as testing

– Mean absolute error, MAE, to eval rating predictions

28

Results

• Impact by the # of neighbors in LOF method

29

Results

• Comparison of Recommendation Quality

30

Results

• Visualization of GS and Non-GS users

31

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

32

Conclusions

• We develop a novel approach to identify GS users by utilizing the definition related to the user-user correlations

• Our approach can successfully identify GS users

• Our approach is less complicated than the existing approaches

33

Drawbacks and Future Work

• We did not compare our solution with the two existing methods

• The user-user correlations may not be reliable if the rating data is sparse

34

Stay Tuned

• Yong Zheng, Mayur Agnani, Mili Singh. “Identification of Grey Sheep Users By Histogram Intersection In Recommender Systems”. Proceedings of The 13th International Conference on Advanced Data Mining and Applications, 2017

– We improve the proposed solution in the RIIT paper

– We better measure user-user correlations

– We compare our solution with the two existing methods and demonstrate the advantages and effectiveness of our proposed solution

35

Yong Zheng, Mayur Agnani, Mili Singh

School of Applied Technology

Illinois Institute of TechnologyChicago, IL, 60616, USA

Identifying Grey Sheep Users By The Distribution of UserSimilarities In Collaborative Filtering

Recommended