4
Letters Conceptual collaborative filtering recommendation: A probabilistic learning approach Jae-won Lee a , Han-Joon Kim b, , Sang-goo Lee a a School of Computer Science and Engineering, Seoul National University, Republic of Korea b School of Electrical and Computer Engineering, The University of Seoul, Republic of Korea article info Article history: Received 8 April 2009 Received in revised form 16 March 2010 Accepted 4 April 2010 Communicated by R.W. Newcomb Available online 4 May 2010 Keywords: Information retrieval Collaborative filtering Probabilistic learning Concept Recommendation systems abstract Collaborative filtering is one of the most successful and popular methods in developing recommenda- tion systems. However, conventional collaborative filtering methods suffer from item sparsity and new item problems. In this paper, we propose a probabilistic learning approach that solves the item sparsity problem while describing users and items with domain concepts. Our method uses a probabilistic match with domain concepts, whereas conventional collaborative filtering uses an exact match to find similar users. Empirical experiments show that our method outperforms the conventional ones. & 2010 Elsevier B.V. All rights reserved. 1. Introduction As the amount of information items on the Web, such as music, movie, image, and web pages, has grown explosively, it becomes highly important to find the items that are relevant to users’ preferences. When a user does not know exact keywords for target items, recommendation systems can be more useful rather than information search systems. A recommendation system performs to present information items that are likely of interest to the user, and it predicts the rate that a user would give to an item by utilizing user preference. In developing recommendation systems, one of the most successful and popular methodologies is ‘collaborative filtering’. Collaborative filtering (CF) is to predict the preferences of an active user from the preference data of his/her similar users. Thus, it is critical to exactly compute the similarity between users. Currently, the Pearson correlation coefficient and cosine similarity are widely used in conventional systems [1,3,8,9]. However, despite its success and popularity, conventional CF methods have two drawbacks. One is ‘item sparsity’ problem, which means that as the number of items in database increases, the density of each user’s records with respect to these items will decrease [5]. When similar users have rated different items, it is hard to figure out that those users have similar preferences. The other one is ‘new item’ or ‘cold start’ problem, which means that it is impossible to recommend new or recently added items until they are rated by other users. Our paper focuses on a probabilistic learning framework for solving the ‘item sparsity’ problem while describing a conceptual collaborative filtering method that associates users and items with concepts. The ‘item sparsity’ problem cannot be avoided as long as a CF system uses the similarity measures such as Pearson correlation coefficient and cosine similarity. This is because such a CF system requires an exact match to compute the similarity between users only with explicit feedback on items such as users’ scores and purchase logs [4]. In contrast, in our work, items and users are semantically represented as a set of domain concepts with the probabilistic framework, and we thus achieve to avoid the item sparsity problem and to improve the precision of recommendation. 2. Conceptual collaborative filtering 2.1. Concept representation For concept representation, the proposed learning framework exploits the concept hierarchy of a comprehensive web directory such as the Open Directory Project (refer to http://dmoz.org). Thus, a concept (or category) can be described with web documents that are classified into the concept. Definition 1 (Concepts). Let C ¼fc 1 , ... , c i , ... , c n g be a set of concepts, and n be the total number of concepts. Then, the Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing 0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2010.04.005 Corresponding author. Tel.: + 822 2210 5632; fax: + 822 2249 6802. E-mail address: [email protected] (H.-J. Kim). Neurocomputing 73 (2010) 2793–2796

Conceptual collaborative filtering recommendation: A probabilistic learning approach

Embed Size (px)

Citation preview

Neurocomputing 73 (2010) 2793–2796

Contents lists available at ScienceDirect

Neurocomputing

0925-23

doi:10.1

� Corr

E-m

journal homepage: www.elsevier.com/locate/neucom

Letters

Conceptual collaborative filtering recommendation: A probabilisticlearning approach

Jae-won Lee a, Han-Joon Kim b,�, Sang-goo Lee a

a School of Computer Science and Engineering, Seoul National University, Republic of Koreab School of Electrical and Computer Engineering, The University of Seoul, Republic of Korea

a r t i c l e i n f o

Article history:

Received 8 April 2009

Received in revised form

16 March 2010

Accepted 4 April 2010

Communicated by R.W. Newcombmatch with domain concepts, whereas conventional collaborative filtering uses an exact match to find

Available online 4 May 2010

Keywords:

Information retrieval

Collaborative filtering

Probabilistic learning

Concept

Recommendation systems

12/$ - see front matter & 2010 Elsevier B.V. A

016/j.neucom.2010.04.005

esponding author. Tel.: +822 2210 5632; fax

ail address: [email protected] (H.-J. Kim).

a b s t r a c t

Collaborative filtering is one of the most successful and popular methods in developing recommenda-

tion systems. However, conventional collaborative filtering methods suffer from item sparsity and new

item problems. In this paper, we propose a probabilistic learning approach that solves the item sparsity

problem while describing users and items with domain concepts. Our method uses a probabilistic

similar users. Empirical experiments show that our method outperforms the conventional ones.

& 2010 Elsevier B.V. All rights reserved.

1. Introduction

As the amount of information items on the Web, such as music,movie, image, and web pages, has grown explosively, it becomeshighly important to find the items that are relevant to users’preferences. When a user does not know exact keywords fortarget items, recommendation systems can be more useful ratherthan information search systems. A recommendation systemperforms to present information items that are likely of interest tothe user, and it predicts the rate that a user would give to an itemby utilizing user preference. In developing recommendationsystems, one of the most successful and popular methodologiesis ‘collaborative filtering’.

Collaborative filtering (CF) is to predict the preferences of anactive user from the preference data of his/her similar users. Thus,it is critical to exactly compute the similarity between users.Currently, the Pearson correlation coefficient and cosine similarityare widely used in conventional systems [1,3,8,9].

However, despite its success and popularity, conventional CFmethods have two drawbacks. One is ‘item sparsity’ problem,which means that as the number of items in database increases,the density of each user’s records with respect to these items willdecrease [5]. When similar users have rated different items, it ishard to figure out that those users have similar preferences. Theother one is ‘new item’ or ‘cold start’ problem, which means that it

ll rights reserved.

: +822 2249 6802.

is impossible to recommend new or recently added items untilthey are rated by other users.

Our paper focuses on a probabilistic learning framework forsolving the ‘item sparsity’ problem while describing a conceptualcollaborative filtering method that associates users and itemswith concepts. The ‘item sparsity’ problem cannot be avoided aslong as a CF system uses the similarity measures such as Pearsoncorrelation coefficient and cosine similarity. This is because such aCF system requires an exact match to compute the similaritybetween users only with explicit feedback on items such as users’scores and purchase logs [4]. In contrast, in our work, items andusers are semantically represented as a set of domain conceptswith the probabilistic framework, and we thus achieve to avoidthe item sparsity problem and to improve the precision ofrecommendation.

2. Conceptual collaborative filtering

2.1. Concept representation

For concept representation, the proposed learning frameworkexploits the concept hierarchy of a comprehensive web directorysuch as the Open Directory Project (refer to http://dmoz.org).Thus, a concept (or category) can be described with webdocuments that are classified into the concept.

Definition 1 (Concepts). Let C ¼ fc1, . . . ,ci, . . . ,cng be a set ofconcepts, and n be the total number of concepts. Then, the

J.-w. Lee et al. / Neurocomputing 73 (2010) 2793–27962794

concept ci is defined as the average of web document term vectorsunder ci as follows:

ci ¼1

jDci j

Xd

cijADci

dci

j where Dci ¼ fdci

1 , . . . ,dci

j , . . . ,dci

jDci jg ð1Þ

where Dci is the set of web documents that belong to the conceptci, and dci

j is the jth document in Dci . Each web document dci

j isrepresented as the weighted term vector dci

j ¼/w1, . . . ,wm, . . . ,wjV jS where V denotes the set of index terms and eachweight wm for the index term tm is calculated by using thetraditional information retrieval metrics: term frequency (TF) andinverse document frequency (IDF) [2].

To reduce the number of terms used, we remove the stopwords(such as articles, propositions, and conjunctions) which do not help todescribe the concepts, and also use the Porter stemming algorithm [6]to transform inflected words to their stem or root forms.

2.2. Probabilistic learning framework

In terms of probabilistic learning, to recommend the item ik tothe user ui can be regarded as estimating the conditionalprobability pðikjuiÞ.

1 That is, the system recommends severalitems argsmaxfik A I0 ,ngpðikjuiÞ to the user ui, where n is the numberof recommended items, in which argsmaxfxAX,ngFðxÞ denotes thetop-n values of x for sorted values of F(x), assuming that thevalues of x are discrete.2 The probability pðikjuiÞ measures howmuch an active user ui likes ikA I0 although the user ui has not yetaccessed the item ik.

The collaborative filtering recommendation is to predict a user’spreference for items by using preference information of his/hersimilar users, and thus, pðikjuiÞ needs to contain the random variablesut to represent all the (target) users other than ui. By applying the lawof total probability to pðikjuiÞ, it is written as follows:

argsmaxfik A I0 ,ng

pðikjuiÞ ¼ argsmaxfik A I0 ,ng

Xut AU

pðikjutÞ � pðuijutÞ � pðutÞ

¼ argsmaxfik A I0 ,ng

Xut AU

pðikjutÞ � pðuijutÞ ð2Þ

where U is a set of users within which ui is an active user and ut is asimilar user of ui. I

0

is a set of items that the only similar user (i.e., ut)has accessed. p(ut) is the user probability that any random user ut isselected, and for simplicity’s sake, it is assumed to be equivalent for allusers; that is, pðutÞ ¼ 1=number of total users. Thus, by the definitionof argsmax in Section 2.2, we can remove p(ut) in Eq. (2). And, pðikjutÞ

is the probability that a randomly chosen item from items that theuser ut has accessed is the item ik. It is estimated as accessðut ,ikÞ=accessðutÞ, where access(ut,ik) is the access count of the user ut for theitem ik, and access(ut) is the total access counts of the user ut. Theaccess count can be replaced with click-through count, rating score, orpurchase counts depending on the properties of items. As for pðuijutÞ,we interpret it as the probability that any random user from userssimilar to the user ut is the user ui, and thus we think that thisprobability can express the degree of similarity between two users.

As we intend to associate users with concepts, the probabilitypðuijutÞ is re-written with regard to the concept cj as follows:

pðuijutÞ ¼Xcj AC

pðuijcjÞ � pðut jcjÞ � pðcjÞ ¼ pðutjuiÞ ð3Þ

1 Normally, discrete probability distributions are characterized by a prob-

ability mass function, p such that Pr(X¼x)¼p(x), where X is a random variable and

x is a value (i.e., event) for X.2 In comparison with argsmaxfxAX,ngFðxÞ, argmaxxAX FðxÞ is the value of x for

which F(x) has the largest value.

where pðuijcjÞ is the probability that a randomly chosen user fromusers related to the concept cj is the user ui. Actually, Eq. (3) canplay a role of a similarity function since it is symmetric (i.e.,pðuijutÞ equals pðut juiÞ). In the context of collaborative filtering,users can be represented with the items that they selected (orbought), and thus we join the factor of the item ik into pðuijcjÞ byapplying both the law of total probability and Bayes’ theorem asfollows:

pðuijcjÞ ¼Xik A I

pðuijikÞ � pðcjjikÞ � pðikÞ

¼Xik A I

pðikjuiÞ � pðuiÞ

pðikÞ�

pðikjcjÞ � pðcjÞ

pðikÞ� pðikÞ

¼Xik A I

pðcjÞ � pðuiÞ

pðikÞ� pðikjcjÞ � pðikjuiÞ ð4Þ

where I denotes a set of items that an active user ui has accessed.And, p(cj), p(ui), and p(ik) are the prior probability for the conceptcj, the user ui, and the item ik, respectively, and they are assumedto be equivalent for all the value of related random variables.Consequently,

pðuijcjÞpXik A I

pðikjcjÞ � pðikjuiÞ ð5Þ

where pðikjcjÞ is the probability that a randomly chosen item fromitems related to the concept cj is the item ik. pðikjuiÞ is similar topðikjutÞ in Eq. (2). The problem is how to estimate pðikjcjÞ. Supposean itemset I¼ fi1, . . . ,ilg be a set of items and ik ¼/wk1, . . . ,wkjV jSwhere wkm is the weighted value of term tm. We want to associatean item with a concept, and thus each value of wkm (i.e., pðtmjikÞ) isused to defined pðikjcjÞ, which is written as the followingproportional equation as in Eq. (5).

pðikjcjÞpX

tm A ik

pðtmjikÞ � pðtmjcjÞ ð6Þ

where pðtmjikÞ can be estimated by computing wðik,tmÞ=wðikÞ,where w(ik,tm) denotes the term weight of tm in ik and w(ik)denotes the total weight sum of all terms in ik. Similarly, pðtmjcjÞ isestimated by computing wðcj,tmÞ=wðcjÞ, where w(cj,tm) denotes theterm weight of tm in cj and w(cj) denotes the total weight sum ofall terms in cj. Note that each of concepts is described bydocuments within each of web categories.

Finally, the probability pðikjuiÞ for recommending the item ikto the user ui is re-written by combining Eqs. (3), (4) and (6) withEq. (2) as follows:

argsmaxfik A I0 ,ng

pðikjuiÞ ¼ argsmaxfik A I0 ,ng

Xut AU

pðikjutÞ � pðuijutÞ ¼ argsmaxfik A I0 ,ng

Xut AU

pðikjutÞ

�Xcj AC

Xik A I

Xtm A ik

pðtmjikÞ � pðtmjcjÞ

0@

1A� pðikjuiÞ

8<:

9=;

24

�Xik A I0

Xtm A ik

pðtmjikÞ � pðtmjcjÞ

0@

1A� pðikjutÞ

8<:

9=;

35 ð7Þ

As a result, we build the prediction model for recommendation,which is composed of pðijuÞ, pðtjiÞ, and pðtjcÞ with the past trainingdata.

3. Experiments

In our experiment, we have extracted 11,584 concepts fromthe Open Directory Project (ODP) whose domain is limited tomusic; that is, categories under the category /Top/Arts/Music

in ODP. The 9394 users’ click-through logs from Last.fm are

J.-w. Lee et al. / Neurocomputing 73 (2010) 2793–2796 2795

crawled for experiments. Among the users, we have assigned 50users to active users. In addition, 22,216 music meta-data areused as items. For evaluation, we introduce two performancemeasures: Precisionk and InverseRankPrecisionk.

Precisionk ¼number of items the user selects in the top�k items

k

InverseRankPrecisionk ¼

Pki ¼ 1

1

rank according to user’s preferencePk

i ¼ 1

1

i

Precisionk is for examining whether they are relevant to the user’spreferences for top-k items retrieved, and it represents howmuch the system reflects the user’s general preference. And,InverseRankPrecisionk is similar to ‘mean reciprocal rank’ [7], and itrepresents how much the system reflects the user’s main

Fig. 1. Changes in Precisionk and I

Fig. 2. Changes in Precisionk and Inve

preferences because it gives more weights to higher ranked itemsselected. The closer InverseRankPrecisionk reaches 1.0, the moresearch results are adapted to the user’s main preferences. Forcomparison, we use Pearson correlation coefficient based CF andCosine similarity based CF, which are denoted as PCF and CCF,respectively, and our proposed method is denoted as SCF.

Fig. 1 shows the effect of the number of users with Precision10

and InverseRankPrecision10, where the proportion of training set is50%. It shows the effect of the number of similar users whenrecommending items. The number of similar users ranges from 1 to25. In all cases, this figure shows that the proposed SCF significantlyoutperforms PCF and CCF. The result confirms that SCF can reducethe sparsity of items by mapping users and items to a set ofconcepts. In this experiment, Precision10 and InverseRankPrecision10

are the highest when the numbers of similar users are 5 in PCF andCCF, and 10 in SCF, respectively. The number of distinct items torecommend sharply increases with the numbers of similar users in

nverseRankPrecisionk (k¼10).

rseRankPrecisionk for different k.

J.-w. Lee et al. / Neurocomputing 73 (2010) 2793–27962796

PCF and CCF, whereas the number of distinct items slowly increasesin SCF. Therefore, the numbers of similar users of PCF and CCF aresmaller than that of SCF. As for PCF and CCF, even though there is alittle difference in Precision10 of PCF and CCF, InverseRankPrecision10

of CCF is fairly lower than that of PCF in our experiments. This isbecause the Pearson correlation coefficient of PCF considers thedifferences in users’ preference scales.

Fig. 2 shows the changes in Precisonk and InverseRankPrecisionk

from varying k when the number of similar users is 10. Since theaverage number of items that a user has accessed is 50 and theproportion of training set is 50% in our experiments, we increase k

from 5 to 25. This figure also shows that the proposed SCF providesmuch better performance than PCF and CCF. To be noted is thatPrecisonk of SCF, PCF and CCF decrease as the value of k increases, buttheir InverseRankPrecisionk increase as the value of k increases. Thebigger value of InverseRankPrecisionk means that the users can locatetheir preferred items among higher ranked items. This result suggeststhat SCF is more effective method that can accommodate users’ mainpreferences in developing the collaborative filtering systems.

4. Summary

Our contribution in this paper is the first attempt to aprobabilistic learning framework for collaborative learningecommendation. The basic idea behind our approach is toestimate the item recommendation probability, pðijuÞ, by applyingthe law of total probability and Bayes’ theorem. By associatingusers and items with a set of concepts, we achieve to improve theprecision of recommendation and to solve ‘item sparsity’ problemof the conventional CF. Through intensive experiments, wedemonstrate that our conceptual CF method is more precise thanconventional CF methods. Currently, we are investigating a hybridcollaborative filtering method that combines semantic collabora-tive filtering with content based filtering.

Acknowledgments

This research was supported by the MKE (The Ministry ofKnowledge Economy), Korea, under the ITRC (InformationTechnology Research Center) support program supervised by theNIPA (National IT Industry Promotion Agency) (Grant numberNIPA-2010-C1090-1031-0002).

References

[1] H.J. Ahn, A new similarity measure for collaborative filtering to alleviate thenew user cold-starting problem, Information Science: An International Journal178 (1) (2008) 37–51.

[2] R. Baeza-Yates, B. Ribeiro-Neto, Modeling, in: Modern Information Retrieval,Addison Wesley, Boston, MA, 1999.

[3] J. Breese, D. Heckerman, C. Kadie, Empirical analysis of predictive algorithmsfor collaborative filtering, in: Proceeding of the 14th Annual Conferenceon Uncertainty in Artificial Intelligence, Wisconsin, United States, 1998,pp. 43–52.

[4] T.Q. Lee, Y. Park, Y. Park, A time based approach to effective recommendersystems using implicit feedback, Expert Systems with Applications: AnInternational Journal 34 (4) (2008) 3055–3062.

[5] B. Mobasher, X. Jin, Y. Zhou, Semantically enhanced collaborative filtering onthe web, Lecture Notes in Artificial Intelligence, vol. 3209, Springer, Berlin,2004, pp. 57–76.

[6] M.F. Porter, An algorithm for suffix stripping, in: Readings in InformationRetrieval, Morgan Kaufmann, San Francisco, 1997, pp. 313–316.

[7] D. Radev, H. Qi, H. Wu, W. Fan, Evaluating web-based question answeringsystems, in: Proceeding of the Third International Conference on LanguageResources and Evaluation, Las Palmas, Spain, 2002.

[8] P. Resnick, N. Lacovou, M. Suchak, P. Bergstrom, J. Riedl, GroupLens: an openarchitecture for collaborative filtering of netnews, in: Proceeding of the 1994ACM Conference on Computer Supported Cooperative Work, Chapel Hill,United States, 1994, pp. 175–186.

[9] J. Wang, A.P. de Vries, M.J. Reinders, Unifying user-based and item-basedcollaborative filtering approaches by similarity fusion, in: Proceeding of the29th Annual International ACM SIGIR Conference on Research and Develop-ment in Information Retrieval, Seattle, United States, 2006, pp. 501–508.

Jae-won Lee received the B.S. degree in Informationand Telecommunication Engineering from SoongsilUniversity, Seoul, Korea, in 2003, and the M.S. degreein Computer Science and Engineering from SeoulNational University, Seoul, Korea, in 2006. He iscurrently a Ph.D. candidate at School of ComputerScience and Engineering in Seoul National University,Seoul, Korea. His research interests include machinelearning, ontology, and personalized search.

Han-joon Kim received the B.S. and M.S. degrees inComputer Science and Statistics from Seoul NationalUniversity, Seoul, Korea in 1994 and 1996 and thePh.D. degree in Computer Science and Engineeringfrom Seoul National University, Seoul, Korea in 2002,respectively. He is currently an associate professor atthe School of Electrical and Computer Engineering,University of Seoul, Korea. His current research inter-ests include text mining, machine learning, andintelligent information retrieval.

Sang-goo Lee received the B.S. degree in ComputerScience from Seoul National University at Seoul, Korea,in 1985. He also received the M.S. and Ph.D. inComputer Science from Northwestern University atEvanston, Illinois, USA, in 1987 and 1990, respectively.He is currently a professor in School of ComputerScience and Engineering at Seoul National University,Seoul, Korea. He has been the director of Center fore-Business Technology at Seoul National University,since 2001. His current research interests includee-Business technology, database systems, productinformation management, and ontology.