16
Evaluation of Collaborative Filtering Algorithms for Recommending Articles on CiteULike June 29th, 2009 HT 2009, Workshop “Web 3.0: Merging Semantic Web and Social Web” Dr. Peter Brusilovsky, Associate Professor Denis Parra, PhD Student School of Information Sciences University of Pittsburgh

Evaluation of Collaborative Filtering Algorithms for Recommending Articles on CiteULike

Embed Size (px)

Citation preview

Evaluation of Collaborative Filtering Algorithms for Recommending Articles on CiteULike

June 29th, 2009

HT 2009, Workshop “Web 3.0: Merging Semantic Web and Social Web”

Dr. Peter Brusilovsky, Associate ProfessorDenis Parra, PhD StudentSchool of Information SciencesUniversity of Pittsburgh

Outline

• Motivation• Methods

– CCF– NwCF– BM25

• The Study• Description of the Data• Results• Conclusions

MotivationBased on information available on CiteULike : Develop user-centered recommendations of

scientific articles. Investigate the potential of users’ tags in

collaborative tagging systems to provide recommendations.

Compare the accuracy of user-based collaborative filtering methods.

Why CiteULike? Popular collaborative tagging system more topic-

oriented than delicious: article references. Familiarity with the system.

CiteULike

Methods: CCF (1 / 2)• Classic Collaborative Filtering (CCF): user-based

recommendations, using Pearson Correlation (users’ similarity) and adjusted ratings to rank items to recommend [1]

∑∑∑

⊂⊂

−−

−−=

nunu

nu

CRi nniCRi uui

CRi nniuui

rrrr

rrrrnuuserSim

,,

,

22 )()(

))((),(

∑∑

⊂−⋅

+=)(

)(

),(

)(),(),(

uneighborsn

uneighborsn nni

unuuserSim

rrnuuserSimriupred

Methods: CCF (2 / 2)

3

4

1

4

4

1

1

3

3

2

5

3

4

2

1

3

2

2

53

3

2

Methods: NwCF (1 / 2)• Neighbor weighted Collaborative Filtering

(NwCF): Similar to CCF, yet incorporates the “amount of neighbors rating an item” in the ranking formula of recommended items

∑∑∑

⊂⊂

−−

−−=

nunu

nu

CRi nniCRi uui

CRi nniuui

rrrr

rrrrnuuserSim

,,

,

22 )()(

))((),(

),())(1(log),( 10 iupredinbriudpre ⋅+=′

Methods: NwCF (2 / 2)

3

4

1

4

4

1

1

3

3

2

5

3

4

2

1

3

2

2

53

3

2

Methods: BM25 (1 / 2)

• BM25: We obtain the similarity between users (neighbors) using their set of tags as “documents” and performing an Okapi BM25 (probabilistic IR model) Retrieval Status Value [2] calculation.

),())(1(log),( 10 iupredinbriudpre ⋅+=′

∑∈ +

+⋅

+×+−+

⋅=qt tq

tq

tdaved

tdd tfk

tfk

tfLLbbk

tfkIDFRSV

3

3

1

1)1(

))/()1((

)1(

Methods: BM25 (2 / 2)

Query terms Doc_1 Doc_2 Doc_3

The Study

• 7 subjects• To each subject, four lists of 10

recommendations (each list) were created (CCF, NwCF, BM25_10, BM25_20)

• The four lists were combined and sorted randomly (due to overlapping of recommendations, less than 40 items)

• Subjects were asked to evaluate relevance (relevant/somewhat relevant/not relevant) and novelty (novel/ somewhat novel/ not novel)

Description of the Data

Crawl CUL for 20 “center users” (only 7 were used for the study)

Annotation: tuple {user, article, tag}

Item # of unique instances

users 358articles 186,122tags 51,903

annotations 902,711

Results

(a) nDCG (b) Average Novelty (c) Precision_2@5

(d) Precision_2@10 (e) Precision_2_1@5 (f) Precision_2_1@10

Conclusions

• The rating scale must be considered carefully in a CF approach.

• NwCF, which incorporates the number of raters, decreases the uncertainty produced by items with too few ratings.

• The tag-based user similarity approach shows interesting results, which can lead us to consider it a valid approach to Pearson-correlation when using CF algorithms.

• We will incorporate more users in our future studies to make the results more conclusive.

Questions?

Bibliography

• [1] Schafer, J., Frankowski, D., Herlocker, J. and Sen, S. 2007 Collaborative Filtering Recommender Systems. The Adaptive Web. (May 2007), 291-324.

• [2] Manning, C., Raghavan, P. and Schutze, H. 2008 Introduction to Information Retrieval. Cambridge University Press.