Upload
yong-zheng
View
108
Download
1
Embed Size (px)
Citation preview
Yong Zheng, Mayur Agnani, Mili Singh
School of Applied Technology
Illinois Institute of TechnologyChicago, IL, 60616, USA
Identifying Grey Sheep Users By The Distribution of UserSimilarities In Collaborative Filtering
Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
2
Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
3
Information Overload
4
Alleviating Information Overload
5
Information Extraction and Filtering
Query
Information Retrieval
Recommender Systems
Recommender System (RS)
• RS: item recommendations tailored to user tastes
6
Recommender Systems
7
E-Commerce: Amazon.com, eBay.com, BestBuy, NewEgg
Recommender Systems
8
Online Streaming: Netflix, Pandora, Spotify, Youtube, etc
Recommender Systems
9
Social Media: Facebook, Twitter, Weibo, etc
How it works
10
Red
Mars
Juras-
sic
Park
Lost
World
2001
Found
ation
Differ-
ence
Engine
Recommender
Systems
User
Profile
Neuro-
mancer2010
Recommendations
Traditional Recommendation Algorithms
11
Content-Based Recommendation AlgorithmsThe user will be recommended items similar to the ones the user preferred in the past, such as book/movie recsys
Collaborative Filtering Based Recommendation AlgorithmsThe user will be recommended items that people with similar tastes and preferences liked in the past, e.g., movie recsys
Hybrid Recommendation AlgorithmsCombine content-based and collaborative filtering based algorithms to produce item recommendations.
Collaborative Filtering
1212
Collaborative Filtering: Algorithms
13
User-Based KNN Collaborative Filtering (UBCF)Assumption: a user u’s rating on item t is similar to other users’ rating on item t, while this group of similar users is called user K-nearest neighbor
Pirates of the Caribbean 4
Kung Fu Panda 2 Harry Potter 6 Harry Potter 7
U1 4 4 1 2
U2 3 4 2 1
U3 2 2 4 4
U4 4 4 1 ?
Collaborative Filtering: Algorithms
14
User-Based KNN Collaborative Filtering (UBCF)
Pirates of the Caribbean 4
Kung Fu Panda 2 Harry Potter 6 Harry Potter 7
U1 4 4 1 2
U2 3 4 2 1
U3 2 2 4 4
U4 4 4 1 ?
a = the target useri = the target item
N = user neighborhoodu = a user neighbor in N
Collaborative Filtering: Algorithms
15
Popular Challenges in Collaborative Filtering
Data sparsity problems
Cold-start users or items
Grey-sheep users
Incorporate content into collaborative filtering
….
Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
16
Grey Sheep Users
17
Definition 1 by Mark Claypool, et al., 1999
A group of users who neither agree nor disagree with any group of users. Therefore, they will not benefit from the user-based collaborative filtering technique
Definition 2 by John McCrae, et al., 2004
White Sheep Users: high correlations with other users
Black Sheep Users: very few or no correlating users
Grey Sheep Users: unusual tastes, low correlations with others
Research Problem: Identifying Grey Sheep Users
18
Collaborative Filtering Other Algorithms
Existing Approaches
19
There are two existing Approaches
Clustering Technique by Ghazanfar, et al., 2011
Distribution of User Ratings by Gras, et al., 2016
They were developed based on the 1st definition
We propose a novel approach based on 2nd definition
White Sheep Users: high correlations with other users
Black Sheep Users: very few or no correlating users
Grey Sheep Users: unusual tastes, low correlations with others
Note: Black Sheep User refers to the user whom we do not have enough knowledge for.
Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
20
Proposed Solution
21
We propose a novel approach based on 2nd definition
White Sheep Users: high correlations with other users
Black Sheep Users: very few or no correlating users
Grey Sheep Users: unusual tastes, low correlations with others
The Distribution of User-User Correlations or Similarities
Proposed Solution
22
Proposed Solution
Step 1, represent each user as distribution of user correlations
Step 2, select good and bad examples
Step 3, apply outlier detection on selected examples. Grey sheep users are the intersections of bad examples and identified outliers
Step4, examine the quality of identified grey sheep users
Proposed Solution
23
Step 1, Distribution Representations
We calculate user-user correlations by cosine similarity
Obtain the descriptive statistics of the distribution
Proposed Solution
24
Step 2, Example Selection
Good examples: high correlations and left-skewed
Bad examples: low correlations and right-skewed
Proposed Solution
25
Step 3, Outlier Detection by Local Outlier Factor (LOF)
LOF helps identify outliers by the local density
Observations with LOF > 1 will be considered as outliers
We set different threshold values to findthe optimal one for identifying grey sheep users, for example
LOF threshold = 1.0LOF threshold = 1.1LOF threshold = 1.2LOF threshold = ….
Proposed Solution
26
Step 4, Examine the quality of identified GS Users
The parameters in our solution
Example Selection
LOF threshold
Neighbor of neighborhood in LOF method
Our goals or examination criteria
To find as many GS users as possible
Recommendation by UBCF should be worse for GS users than non-GS users
Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
27
Experimental Setting
• Data: MovieLens 10 million rating data
– 10M ratings
– 72K users
– 10K movies
– Each user has rated at least 20 movies
• Evaluation
– 80% as training, 20% as testing
– Mean absolute error, MAE, to eval rating predictions
28
Results
• Impact by the # of neighbors in LOF method
29
Results
• Comparison of Recommendation Quality
30
Results
• Visualization of GS and Non-GS users
31
Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
32
Conclusions
• We develop a novel approach to identify GS users by utilizing the definition related to the user-user correlations
• Our approach can successfully identify GS users
• Our approach is less complicated than the existing approaches
33
Drawbacks and Future Work
• We did not compare our solution with the two existing methods
• The user-user correlations may not be reliable if the rating data is sparse
34
Stay Tuned
• Yong Zheng, Mayur Agnani, Mili Singh. “Identification of Grey Sheep Users By Histogram Intersection In Recommender Systems”. Proceedings of The 13th International Conference on Advanced Data Mining and Applications, 2017
– We improve the proposed solution in the RIIT paper
– We better measure user-user correlations
– We compare our solution with the two existing methods and demonstrate the advantages and effectiveness of our proposed solution
35
Yong Zheng, Mayur Agnani, Mili Singh
School of Applied Technology
Illinois Institute of TechnologyChicago, IL, 60616, USA
Identifying Grey Sheep Users By The Distribution of UserSimilarities In Collaborative Filtering