28
Item-Based Collaborative Item-Based Collaborative Filtering Recommendation Filtering Recommendation Algorithms Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army HPC Research Center Department of Computer Science and Engineering University of Minnesota, Minneapolis, 2001 2008. Nov. 05 Presented by Eun-gyeong Kim, IDS Lab.

Item-Based Collaborative Filtering Recommendation Algorithms

Embed Size (px)

DESCRIPTION

Item-Based Collaborative Filtering Recommendation Algorithms. Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army HPC Research Center Department of Computer Science and Engineering University of Minnesota, Minneapolis, 2001 2008. Nov. 05 - PowerPoint PPT Presentation

Citation preview

Page 1: Item-Based Collaborative Filtering Recommendation Algorithms

Item-Based Collaborative Filtering Item-Based Collaborative Filtering Recommendation AlgorithmsRecommendation Algorithms

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl

GroupLens Research Group/ Army HPC Research Center

Department of Computer Science and Engineering

University of Minnesota, Minneapolis, 2001

2008. Nov. 05

Presented by Eun-gyeong Kim, IDS Lab.

Page 2: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

ContentsContents

Introduction

Collaborative Filtering Based Recommender Systems

Overview of the Collaborative Filtering Process

Challenges of User-based Collaborative Filtering Algorithms

Item-based Collaborative Filtering Algorithm

Item Similarity Computation

Prediction Computation

Performance Implications

Experimental Evaluation

Contributions

Discussion & Conclusion

IDS Lab. Seminar - 2Center for E-Business Technology

Page 3: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Introduction Introduction (What is Collaborative (What is Collaborative filtering?)filtering?)

Now it is time to create the technologies that can help us sift through all the available information to find that which is most valuable to us.

One of the most promising such technologies is collaborative filtering

Collaborative filtering (by Wikipedia)

The process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc.

The underlying assumption of CF approach is that those who agreed in the past tend to agree again in the future

CF systems usually take two steps

Look for users who share the same rating patterns with the active user

Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user

IDS Lab. Seminar - 3Center for E-Business Technology

Page 4: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Two main Categories of CF algorithmsTwo main Categories of CF algorithms

Memory-based CF Algorithms

Utilize the entire user-item database to generate a prediction

Employ statistical techniques to find the neighbors

Model-based CF Algorithms

First developing a model of user ratings.

Computing the expected value of a user prediction , given his/her ratings on other items.

To build the model

– Bayesian network (probabilistic)

– clustering (classification)

– rule-based approaches (association rules between co-purchased items)

IDS Lab. Seminar - 4Center for E-Business Technology

Page 5: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Recommendation AlgorithmsRecommendation Algorithms

User-based collaborative filtering

Traditional Collaborative Filtering

Cluster Models

Item-based collaborative filtering

Search-based Methods

Item-to-item collaborative filtering

Amazon.com Recommendations: Item-to-Item Collaborative Filtering http://www.win.tue.nl/~laroyo/2L340/resources/Amazon-Recommendations.pdf

IDS Lab. Seminar - 5Center for E-Business Technology

Page 6: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

CF Based Recommender SystemsCF Based Recommender Systems

provide item recommendations or predictions based on the opinions of other like-minded users

IDS Lab. Seminar - 6

32 4

Center for E-Business Technology

Page 7: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Traditional Collaborative Filtering (1)Traditional Collaborative Filtering (1)

Represents a customer as an N-dimensional vector of items, where N is the number of distinct catalog items

For almost all customers, this vector is extremely sparse

Generates recommendations based on a few customers(neighbors) who are most similar to the user

Measure the similarity of two customers, A and B

IDS Lab. Seminar - 7Center for E-Business Technology

Page 8: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Traditional Collaborative Filtering (2)Traditional Collaborative Filtering (2) Generate recommendations

A common technique is to rank each item according to how many similar customers purchased it

O(MN) in the worst case

Performance tends to be closer to O(M+N) because the average customer vector is extremely sparse

Scaling issues

Reduce the data size

– Reduce M by randomly sampling the customers or discarding customers with few purchases

– Reduce N by discarding very popular or unpopular items

Reduce recommendation quality

We need better algorithms to scale to large data sets and at the same time produce high-quality recommendations

IDS Lab. Seminar - 8Center for E-Business Technology

Page 9: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Challenges of User-based CF Challenges of User-based CF AlgorithmsAlgorithms

Challenges

Sparsity

– A person may have purchased well under 1% of the items

– (1% of 2 million books is 20,000 books)

– The accuracy of recommendations may be poor

Scalability

– Computation grows with both the number of users and the number of items

– Traditional CF does little or no offline computation, and its online computation scales with the number of customers and catalog items.

=> The key to item-to-item CF’s scalability and performance is that it creates the expensive similar-items table offline

IDS Lab. Seminar - 9Center for E-Business Technology

Page 10: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Item-based CF AlgorithmItem-based CF Algorithm

Similarity computation between two item i and j

First isolate the users who have rated both of these items

Then apply a similarity computation technique to determine the similarity

Prediction generation

Take a weighted average of the target user’s ratings on these similar items

IDS Lab. Seminar - 10Center for E-Business Technology

Page 11: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Item Similarity ComputationItem Similarity Computation

IDS Lab. Seminar - 11Center for E-Business Technology

Page 12: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Item Similarity ComputationItem Similarity Computation

IDS Lab. Seminar - 12

i1 i2 i3 i4 Ave

u1(out of 5)

3 5 3 3.67

u2(out of 5)

1 2 1.5

u3(out of 5)

4 2 4 2 3

u4(out of 5)

5 4 2 3.67

average 2.67 4 3.25

2

(1,2) (1,3) (1,4) (2,3) (2,4) (3,4)

0.61 0.79 0.55 0.87 0.67 0.84

-0.76 0.94 0 -0.57 0 0

-0.94 0.70 -1 -0.54 -0.38 -0.76

Center for E-Business Technology

Page 13: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Prediction ComputationPrediction Computation

IDS Lab. Seminar - 13Center for E-Business Technology

Page 14: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Prediction ComputationPrediction Computation

Weighted Sum Compute the sum of the ratings given by the user on the items

similar to I

Each ratings is weighted by the corresponding similarity

Regression Similarities computed using cosine or correlation measures

may be misleading

Approximated values based on a linear regression model are used (Instead of using the similar item N’s “raw” ratings values )

IDS Lab. Seminar - 14Center for E-Business Technology

Page 15: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Weighted Sum ExampleWeighted Sum Example

Let’s predict the value of item i1 for u4

IDS Lab. Seminar - 15

i1 i2 i3 i4 Ave

u1(out of 5)

3 5 3 3.67

u2(out of 5)

1 2 1.5

u3(out of 5)

4 2 4 2 3

u4(out of 5)

5 4 2 3.67

average 2.67 4 3.25

2Pu4,i1 Pu2,i2 Pu1,i4 Pu2,i4

3.75 1.59 3.65 1.60

4 - - -

4 - - -

(1,2) (1,3) (1,4) (2,3) (2,4) (3,4)0.61 0.79 0.55 0.87 0.67 0.84

-0.76 0.94 0 -0.57 0 0

-0.94 0.70 -1 -0.54 -0.38 -0.76

Center for E-Business Technology

Page 16: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Item-to-item CF in Amazon.comItem-to-item CF in Amazon.com

We could build a product-to-product matrix by iterating through all item pairs and computing a similarity metric for each pair.

However, many product pairs have no common customers, thus the approach is inefficient in terms of processing time and memory usage

Better approach by calculating the similarity between a single product and all related products

in the worst case

in practical

IDS Lab. Seminar - 16Center for E-Business Technology

Page 17: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Performance ImplicationsPerformance Implications

Precompute item-item similarity scores In a typical E-Commerce scenario, we usually have a set of

item that is static compared to the number of users that changes most often

Compute all-to-all similarity and then performing a quick table look-up to retrieve the required similarity values

Generating predictions for a user u on item i Retrieves the precomputed k most similar items

corresponding to the target item i

Then intersect between those k items and items purchased by the user u

The prediction is computed using basic item-based CF algorithm

IDS Lab. Seminar - 17Center for E-Business Technology

Page 18: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Experimental Evaluation: Data setExperimental Evaluation: Data set

Movie data

Data from MovieLens

– 943 users (among 43,000 users )

– 1682 movies (among over 3,500 different movies)

– 100,000 ratings (only considered users that had rated 20 or more movies)

Divided the DB into a training set and a test set.

– X=0.8 (80% of the data is used as training set)

Sparsity level:

IDS Lab. Seminar - 18Center for E-Business Technology

Page 19: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Experimental Evaluation: Evaluation Experimental Evaluation: Evaluation MetricsMetrics

Statistical accuracy metrics Mean Absolute Error (MAE) is a measure of the deviation of

recommendations from their true user-specified values.

The lower the MAE, the more accurately the recommendation engine predicts user ratings.

Decision support accuracy metrics

IDS Lab. Seminar - 19Center for E-Business Technology

Page 20: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Experimental Results (1)Experimental Results (1)

Effect of Similarity Algorithms

IDS Lab. Seminar - 20Center for E-Business Technology

Page 21: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Experimental Results (2)Experimental Results (2)

Sensitivity of Training/Test Ratio

Experiments with neighborhood size

IDS Lab. Seminar - 21Center for E-Business Technology

Page 22: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Experimental Results (3)Experimental Results (3)

Quality Experiments

IDS Lab. Seminar - 22Center for E-Business Technology

Page 23: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Sensitivity of the Model SizeSensitivity of the Model Size

The High accuracy that can be achieved using only a fraction of items

It is useful to precompute the item similarities using only a fraction of items and yet possible to obtain good prediction quality

IDS Lab. Seminar - 23Center for E-Business Technology

100%

98.3%

96%

Page 24: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Impact of the model size on run-time and Impact of the model size on run-time and throughputthroughput

IDS Lab. Seminar - 24Center for E-Business Technology

Page 25: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

ContributionsContributions

Analysis of the item-based prediction algorithms and identification of different ways to implement its subtasks

Formulation of a precomputed model of item similarity to increase the online scalability of item-based recommendations

An experimental comparison of the quality of several different item-based algorithms to the classic user-based (nearest neighbor) algorithms

IDS Lab. Seminar - 25Center for E-Business Technology

Page 26: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

Discussion & ConclusionDiscussion & Conclusion

Discussion

Item-item scheme provides better quality of predictions than the user-user scheme

Item neighborhood is fairly static, which can be pre-computed, which results in very high online performance

Possible to retain only a small subset of items and produce reasonably good prediction quality

Conclusion

Item-based techniques allow CF-based algorithms to scale to large data sets and at the same time produce high-quality recommendations

IDS Lab. Seminar - 26Center for E-Business Technology

Page 27: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

My commentsMy comments

Lack of explanations about recommendation process

Does the calculated similarity really represent the similarity of items?

Lack of explanations about the range of similarity value

Can’t we precompute the similarity of users?

IDS Lab. Seminar - 27Center for E-Business Technology

Page 28: Item-Based Collaborative Filtering Recommendation Algorithms

Copyright 2008 by CEBT

ReferencesReferences

Amazon.com Recommendations: Item-to-Item Collaborative Filtering http://www.win.tue.nl/~laroyo/2L340/resources/Amazon-Recommendations.pdf

Item-based Collaborative Filtering Recommendation Algorithms

http://www.grouplens.org/papers/pdf/www10_sarwar.pdf

IDS Lab. Seminar - 28Center for E-Business Technology