Download pdf - Collaborative Filtering Based Recommendationspeterb/2480-012/CF-2011.pdf · Recommendations Danielle Lee ... Originated from the Information Tapestry project at Xerox ... Motivations

2/20/2011

1

Collaborative Filtering Based Recommendations

Danielle Lee

Fabruary 16, 2011

1

If I have 3 million customers on the Web, I should have 3 million stores on the Web

- Jeff Bezos, CEO of Amazon.com

Collaborative Filtering Recommender System, Danielle Lee

2

2/20/2011

2

One Exemplary Recommendation

3

Classification of Recommender Systems

• Collaborative Filtering Recommender System– “Word‐of‐Mouth” phenomenon.

• Content‐based Recommender System– Recommendation generated from the content features

associated with products and the ratings from a user.

• Case‐based Recommender System– A kind of content‐based recommendation. Information are

represented as case and the system recommends the cases that are most similar to a user’s preference.

• Hybrid Recommender System– Combination of two or more recommendation techniques to

gain better performance with fewer of the drawbacks of any individual one (Burke, 2002).


4

2/20/2011

3

Recommendation TaxonomyRecommendation Method

Raw retrievalManually selected

Statistical summarization

Community Inputs

Targeted Customer Inputs

Implicit navigation

Degree of

Statistical summarizationAttribute‐based

Item‐to‐item correlationUser‐to‐user correlation

Implicit navigationExplicit navigationKeyword/ItemAttributeRatingsPurchase History

Item attributeExternal Item

PopularityPurchase History

RatingsText Comments

E‐store Engine

OutputsSuggestion, PredictionRatings, Reviews

DeliveryPersonalizationNon‐personalized

EphemeralPersistent

5Schafer, et al. (2001)

PushPull

Response/FeedbackResponse/Feedback

Good Offer for You!!

Very Simple Procedure of Recommendations

1. Understand and model users

2. Collect candidate items to recommend.

3. Based on your recommendation method, predict target users’ preferences for each candidate item.

4 Sort the candidate items according to the4. Sort the candidate items according to the prediction probability and recommend them.


6

2/20/2011

4

What is Collaborative Filtering?

Originated from the Information Tapestry project at Xerox PARC. It allows its users to annotate the documents that they read and system recommends

Is also called “nearest neighbor recommendation”.

Collaborative Filtering is ‘the process of filtering or evaluating items using the opinions of other people.’

CF recommends items which are likely interesting to a target user based on the evaluation averaging the opinions of people with similar tastes.

People who agreed with me in the past will also agree People who agreed with me in the past, will also agree in the future. On the other hand, the assumption of Content‐based recommendation is that Items with similar objective features will be rated similarly.


7

General Procedure of CF Recommendation

1.Select like‐minded peer group for a target user

2. Choose candidate items which are not in the list of the target user but in the list of peer group.

3.Score the items by producing a weighted score and predict the ratings for the given items.

8

4.Select the best candidate items and recommend them to a target user.

Redo all the procedures through 1 ~ 4 on a timely basis.

2/20/2011

5

User‐based Nearest Neighbor Recommendation (2)

The input for the CF prediction algorithms is a matrix of users’ ratings on items, referred as the

Item 1 Item 2 Item 3 Item 4 Item 5 Average

Alice 5 3 4 4 ??? 16/4

User1 3 1 2 3 3 9/4

Target UserTarget User

g ,ratings matrix.

User2 4 3 4 3 5 14/4

User3 3 3 1 5 4 12/4

User4 1 5 5 2 1 13/4


9


6

2

3

4

5

Alice

User1

User2

User4


10

0

1

Item 1 Item 2 Item 3 Item 4

2/20/2011

6

User‐based Nearest Neighbor Recommendation (3)_User Similarity

• Pearson’s Correlation Coefficient for User a and User b for all rated Products PUser b for all rated Products, P.

)(

2,

)(

2,

)(,,

)()(

))((),(

Pproductp bpbPproductp apa

Pproductp bpbapa

rrrr

rrrrbasim

• Pearson correlation takes values from +1 (Perfectly positive correlation) to ‐1 (Perfectly negative correlation) .


11

User‐based Nearest Neighbor Recommendation (4) _Rating

Prediction

)(

)(,

),(

)(),(),(

nneighborsb

nneighborsb bpb

a basim

rrbasimrpapred


12

2/20/2011

7


• Adjusted Cosine similarity, Spearman’s rank correlation coefficient, or mean squared different correlation coefficient, or mean squared different measures.

• Necessity to reduce the relative importance of the agreement on universally liked items : inverse user frequency (Breese, et al., 1998) and variance weighting factor (Herlocker, et al., 1999).

• Skewed neighboring is possible: Significance• Skewed neighboring is possible: Significance weighting (Herlocker, et al., 1999).

• Calculating a user’s perfect neighborhood is immensely resource intensive calculations


13

Core Concepts in CF

• User: any individual who provides ratings to a system– User who provides ratings and user who receive recommendationsUser who provides ratings and user who receive recommendations

• Item: anything for which a human can provide a rating. – Ex) art, books, CDs, journal articles, music, movie, or vacation

destinations

• Ratings: vote from a user for an item by means of some value– Scalar/ordinal ratings (5 points Likert scale), binary ratings

(like/dislike), unary rating (observed/abase of rating) ( ) y g ( g)

– Explicit ratings and implicit ratings


14

2/20/2011

8

One Typical CF recommendation

15

One Typical CF recommendation

16

2/20/2011

9

Motivations for Collaborative Filtering based Recommendations

• Collaborative filtering systems work by people in system, and it is expected that people to be better at evaluating information than a computed function

• CF doesn’t require contents. • Completely independent of any machine‐readable representation of the objects being recommended.recommended.– Works well for complex objects (or multimedia) such as music, pictures and movies

• More diverse and serendipitous recommendation


17

Prediction/Recommendation Generation

Prediction/Recomm

Probabilistic Algorithm

Non‐probabilistic Algorithm

endation Algorithm


18

User‐based Nearest Neighbor

Item‐based Nearest Neighbor

DimensionReduction

Bayesian‐NetworkModels

Others

2/20/2011

10

Item‐based Nearest Neighbor Recommendation (1)

Item 1 Item 2 Item 3 Item 4 Item 5 AverageTarget UserTarget User Item 1 Item 2 Item 3 Item 4 Item 5 Average

Alice 5 3 4 4 ??? 4.0

User1 3 1 2 3 3 2.4

User2 4 3 4 3 5 3.8

User3 3 3 1 5 4 3.2

Target UserTarget User


19

User4 1 5 5 2 1 2.8


Item 1 Item 2 Item 3 Item 4 Item 5

Alice 1 ‐1 0 0

User1 0.6 ‐1.4 ‐0.4 0.6 0.6

User2 0.2 ‐0.8 0.2 ‐0.8 1.2

User3 ‐0.2 ‐0.2 ‐2.2 1.8 0.8

User4 1 8 2 2 2 2 0 8 1 8User4 ‐1.8 2.2 2.2 ‐0.8 ‐1.8


20

2/20/2011

11


Generate predictions based on similarities between items: Prediction for a user a and item x is composeditems: Prediction for a user a and item x is composed of a weighted sum of the users’ ratings for items most similar to x.

Adjusted Cosine Similarity

uyuuUuxu rrrr

yxsim,, ))((

)(

21

Uu uyuUu uxu rrrr

yxsim2

,2

, )()(),(

)(

,)(

),(

),(),(

umssimilarItei

iuumssimilarItei

pisim

rpisimpupred


More computationally efficient than user‐based nearest neighborsneighbors.

Compared with user‐based approach that is affected by the small change of users’ ratings, item‐based approach is more stable.

Recommendation algorithm used by Amazon.com (Linden et al., 2003).


22

2/20/2011

12

Other Non‐Probabilistic Algorithms (1)

• Dimensionality Reduction

– Map item space to a smaller number of underlying “dimensions.”

– Matrix Factorization/Latent Factor models such as Singular Value Decomposition, Principal Component Analysis, Latent Semantic Analysis, tetc.

– Expensive offline computation and mathematical complexity


23

Other Non‐Probabilistic Algorithms (2)• Matrix Factorization got an attention since Netflix Prize competition.


24

2/20/2011

13


• Association Rule Mining

– Ex) “If a customer purchases baby food then the customer l b d f h ”also buys diapers in 70% of the cases.”

– Build Models based on commonly occurring patterns in the ratings matrix.

– “If user X liked both item 1 and item 2, then X will most probably also like item 5.”

S t (X Y)Number of Transactions containing X U Y


25

Support (X→Y) =g

Number of Transactions

Confident(X→Y) =Number of Transactions containing X U Y

Number of Transactions containing X


• Association Rule Mining

I 1 I 2 I 3 I 4 I 5Item 1 Item 2 Item 3 Item 4 Item 5

Alice 1 0 0 0

User1 1 0 0 1 1

User2 1 0 1 0 1

User3 0 0 0 1 1

• For association rule of item1 →item5, the support is 2/4 and confidence 2/2


26

User4 0 1 1 0 0

2/20/2011

14


• Recommendation procedure based on association rules (by Sarwar, et al., 2000)association rules (by Sarwar, et al., 2000)– Determine the set of X→Y association rules that are relevant for a target user.

– Compute the union of items appearing in the consequent Y of these association rules that have not been purchased by the target user.

– Sort the products according to the confidence of the l th t di t d th If lti l l t drule that predicted them. If multiple rules suggested

one product, take the rule with the highest confidence– Return the first N elements of this ordered list as a recommendation.


27

Probabilistic Algorithm

• Bayesian‐Network– Derive probabilistic dependencies among users orDerive probabilistic dependencies among users or items using decision trees.

• Probabilistic Clustering/Dimensionality Reduction Techniques.

• Expectation Maximization (EM) algorithm for CF with Gaussian probability distribution.

• Probabilistic algorithms can produce a probability• Probabilistic algorithms can produce a probability distribution across possible rating values –information that captures the likelihood of each possible rating value.


28

2/20/2011

15

Properties of Domains

Are the properties of Data Distribution suitable for CF? There are many items. Most users rate a single item. There are more items to be recommended than users. Very skewed rating distribution.

Are the Underlying Meaning properties suitable for CF? For each user of the community, there are other users with common needs or tastes.

Item evaluation requires personal tasteq p Items are heterogeneous

Are these properties of Data Persistence suitable for CF? Dynamically changing items (e.g. news or job cases) Persistent Taste


29

User Tasks studied in CF recomendation

• Help me find new items I might like

• Advise me on a particular item

• Help me find a user (or some users) I might like

• Help our group find something new that we might likemight like

• Help me find a mixture of “new” and “old” items


30

2/20/2011

16

Evaluation of Collaborative Filtering System

To determine the quality of the predictions and recommendations Accuracy Rating accuracy : error between the predicted ratings and the true ratings. Mean Absolute Error (MAE) = average absolute difference between the predicted ratings and the actual rating given by a user

Precision Rank accuracy : half‐life utility.

Novelty / Serendipity (Karypis, 2001) Coverage (Sarwar, et. al., 2000)g ( ) Learning Rate (Schein, et. al., 2001) Confidence (Herlocker, 2000) User Satisfaction (Swearingen & Sinha, 2001; Dahlen, B. J., 1998) Site Performance


31

Exercise to recommend using CF technology

Astronomy for Kid

Bagheera: In the Wild

Learning Network

The GeoNet G

Math Maniac

Leonardo Homepag

Kids Wild Game e

Bob 4 1 5

Alice 5

Mark 1 5 4 ???

32

Mark 1 5 4 ???

Kate 1 5 4 3 5

Modified from the original table in Walker, et al., 2004

2/20/2011

17

Problems regarding CF (Cont.)

Data Sparsity & Ratings scarcity Th ti t i i d l ll The ratings matrix is sparse and only a small fraction of all possible user item entries is known.

Many CF algorithms have been designed specifically for data sets where there are many more users than items (e.g., the MovieLens data set has 65,000 users and 5,000 movies).

CF may be inappropriate in a domain where there are many more items than users.

Implicit vs. explicit ratings


33


Problems regarding cold‐start. New item problem : the fact that if the number of users that prated an item is small, accurate prediction for this item cannot be generated.

New user problem : the fact that if the number of items rated by a user is small, it is unlikely that there could be an overlap of items rated by this user and active users. User‐to‐user similarity cannot be reliably computed.

New community problem : Without sufficient ratings, it’s h d diff i l b li d CFhard to differentiate value by personalized CF recommendations.

Clear reward systems are necessary to convince users to vote or rate items.


34

2/20/2011

18

Possible solutions for Cold‐start Problem

As the solution for new user problem: Displaying non‐personalized recommendation until the user has rated enough

Asking the user to describe their taste in aggregate Asking the user for demographic information and using ratings of other users with similar demographics as recommendations

As the solution for new item problem: Recommending items through non‐CF techniques content analysis or metadata

R d l l i i i h f i d ki Randomly selecting items with few or no ratings and asking user to rate those items.

As the solution for new community problem: Provide ratings incentives to a small “bootstrap” subset of the community, before inviting the entire community.


35


• Rarely‐rated entities : users, items, and user d it i ith f tiand item pairs with few co‐ratings

– Discard rarely‐rated entities: Simple and clean approach but the decreased coverage.

– Adjust calculation for rarely‐rated entities: Adjustment amount inversely proportional to the number of ratings


36

2/20/2011

19


• Opinionated users : Provided more than 4 ti d th td d i t th 1 5ratings and the std. dev. is greater than 1.5

• Black sheep (Peculiar users) : provided more than 4 ratings and for which the average distance of their rating on item i with respect to mean rating of item i is greater than 1g g

• Controversial items : received rating whose std. dev. Is greater than 1.5


37


Explanation “Why this item was recommended to me?” Most recommender systems are black box approach and need to

provide transparency. Explanations provide transparency, exposing the reasoning and data

behind a recommendation (Herlocker, et al., 2000) Benefits of Explanations are

Transparency, Scrutability, User Involvement, Education, Acceptance, Trust, Effectiveness, Persuasiveness, Satisfaction (Tintarev & Masthoff, 2007)

Explanations for ‘How’ and ‘Why’ are required (Explanations about model/process error & data error)model/process error & data error)

Privacy & Security Trust / Social Network based recommendations Confidence Matrix


38

2/20/2011

20

Problems regarding CF

Ad hoc user profiles / Copy profile attack Malicious intent to bias recommendations in their favor. Real profile attack case about sex manual http://www.news.com/2100‐1023‐976435.html

Shilling attacks (profile injection attacks) Push attacks

Nuke attacks

Robust statistical methods to detect spam or random noise are required.

M‐estimator SVD (Singular value decomposition) / new SVD based on Hebbianlearning.

PLSA


39

“Active/Trusted” Collaborative Filtering

• More closer approach to the “word of mouth”

• To search trustable users by exploiting trust propagation over• To search trustable users by exploiting trust propagation over the trust network, not to search similar users as CF (Massa & Avesani, 2007)– possible to cover more than half users with reasonable error just

based on their small number of ratings like 2, 3, or 4 ratings.

– For users with 4 ratings, trust can make recommendation for 66% of the users while CF can do for 14% and with a higher error


40

2/20/2011

21

Trust Networks and Trust Metrics

Trust Metrics : Algorithms whose goal is to di t b d th t t t k thpredict, based on the trust network, the

trustworthiness of “unknown” users.

Local Trust Metrics : the very personal and subjective views of the users. Different value of trust in other users for every user MoleTrust

Global Trust Metrics : a global “reputation” value that approximates how the community as a whole considers a certain user. PageRank


41

Trust‐Aware Recommender Architecture


42

(Massa & Avesani, 2004; Massa & Avesani, 2007)

2/20/2011

22

Hybrid Recommender System• Combination of Two or more different Recommendation Technologies

• The spaces of possible hybrid recommender systems (Burke, 2007)

Weight Mixed Switch FC Cascade FA Meta

CF/CN

CF/DM

CF/KB

CN/CF

CN/DM

CN/KB

DM/CF

43

/

DM/CN

DM/KB

KB/CF

KB/CN

KB/DM

FC = Feature Combination, FA = Feature Augmentation, CF = Collaborative,CN = Content-based, DM = Demographic, KB = Knowledge-based

Other Useful Resources

Adomavicius, G. & Tuzhilin, A. (2005) Toward the Next Generation of Recommender Systems: A Survey of the State‐of‐the‐Art and Possible Extensions IEEE Transactions on Knowledge and Data Engineering 17 (6)Extensions, IEEE Transactions on Knowledge and Data Engineering, 17 (6), pp. 734 ~ 749

Herlocker, J. L., Konstan, J. A., Terveen, L. G. & Riedl, J. T. (2001) Evaluating Collaborative Filtering Recommender Systems, ACM Transations Inf. Syst., 22 (1), pp. 5 ~ 53

Schafer, J. B., Konstan, J. & Riedl, J. (2001) E‐Commerce Recommendation Applications, Data Mining & Knowledge Discovery, 5, pp. 115 ~ 153

Paulson, P. & Tzanavari, A. (2003) Combining collaborative and content filtering using conceptual graphs

Kaultz, H., Selman, B. & Shah, M. (1997) Referral Web: Combining Social Networks and Collaborative Filtering.

For more useful resources, refer to the CF related page in course wiki.


44