Download pdf - Ieml social recommendersystems

Technical Report, March 2010

1

Social Recommender Systems

on IEML Semantic Space

Heung-Nam Kim, Andrew Roczniak, Pierre Lévy, Abdulmotaleb El-Saddik

Collective Intelligence Lab, University of Ottawa

4 March 2010

H. N. Kim, A. Roczniak, P. Lévy, A. EI-Saddik, “Social recommender systems on IEML semantic space,”

Collective Intelligence Lab, University of Ottawa, Technical Report, March 2010


2

Social Recommender Systems

on IEML Semantic Space

Heung-Nam Kim1,2, Andrew Roczniak1, Pierre Lévy1, Abdulmotaleb El Saddik2

1 Collective Intelligence Lab, University of Ottawa 2Multimedia Communication Research Lab, University of Ottawa

Abstract

In this report, we present two social recommendation methods that are incorporated

with the semantics of the tags: a user-based semantic collaborative filtering and an item-

based semantic collaborative filtering. Social tagging is employed as an approach in

order to grasp and filter users’ preferences for items. In addition, we analyze potential

benefits of IEML models for social recommender systems in order to solve polysemy,

synonymy, and semantic interoperability problems, which are notable challenges in

information filtering. Experimental results show that our methods offer significant

advantages both in terms of improving the recommendation quality and in dealing with

polysemy, synonymy, and interoperability issues.

1 Introduction

The prevalence of social media sites bring considerable changes not only on people’s

life patterns but also on the generation and distribution of information. This social

phenomenon has transformed the masses, who were only information consumers via

mass media, to be producers of information. However, as rich information is shared

through social media sites, the sheer amount of information that has not been available

before is increasing exponentially with daily additions. As well as finding the most

attractive and relevant content, users struggle with a great challenge in information

overload. Recommender systems that have emerged in response to the above challenges

provide users with recommendations of items that are more likely to fit their needs [17].


3

With the popularity of social tagging (also known as collaborative tagging or

folksonomies) a number of researchers have recently concentrated on recommender

systems with social tagging [10, 13, 20, 22, 25]. Because modern social media sites,

such as Flickr1, YouTube2, Twitter3, and Delicious4 allow users to freely annotate their

contents with any kind of descriptive words, also known as tags [6], the users tend to

use the descriptive tags to annotate the contents that they are interested in [13].

Recommender systems incorporated with tags can alleviate limitations of traditional

recommender systems, such as the sparsity problem and the cold start problem [1], and

thus the systems eventually provide promising possibilities to better generate

personalized recommendations. Although these studies obtain reasonable promise of

improving the performance, they do not take into consideration the semantics of tags

themselves. Consequently, the lack of semantic information suffers from fundamental

problems: polysemy and synonymy of the tags, as clearly discussed in [6]. Without the

semantics of the tags used by users, the systems cannot differentiate the various social

interests of the users from the same tags. Furthermore, they cannot provide semantic

interoperability that is a notable challenge in the cyberspace [11].

To address the discussed issues, we introduce a new concept to capture semantics of

user-generated tags. We then propose two social recommendation methods that are

incorporated with the semantics of the tags: a user-based semantic collaborative filtering

and an item-based semantic collaborative filtering. First, in the user-based method, we

determine similarities between users by utilizing users’ semantic-oriented tags,

collectively called Uniform Semantic Locator (USL) and subsequently identify

semantically like-minded users for each user. Finally, we recommend social items (e.g.,

text, picture, video) based on the social ranking of the items that are semantically

associated to tags that like-minded users annotate. Second, in the item-based method,

we determine similarities between items by utilizing USLs and identify semantically

similar items for each item. Finally, we recommend social items based on the

semantically similar items.

1 http://www.flickr.com 2 http://www.youtube.com 3 http://twitter.com 4 http://delicious.com


4

The main contributions of this study toward social recommender systems can be

summarized as follows: 1) We present and formalize models for semantic-oriented

social tagging in dealing with the issues of polysemy, synonymy, and semantic

interoperability. We also illustrate how the models can be adapted and applied to

existing social tagging systems. 2) We propose the methods of social recommendations

in semantic space that aim to find semantically similar users/items and discover social

items semantically relevant to users’ needs.

The rest of this report is organized as follows: in next section we review concepts

related to collaborative filtering and provide recent studies applying social tagging to

recommender systems. In Section 3, we provide some models used in our study. We

then describe our semantic models for social recommender systems and provide a

detailed description of how the models are applied to item recommendations in Section

4. In Section 5, we present the effectiveness of our methods through experimental

evaluations. Finally, we summarize our work.

2 Related Work

In this section, we summarize previous studies and position our study with respect to

other related works in the area.

2.1 Collaborative Filtering

Following the proposal of GroupLens [16], automated recommendations based on

Collaborative Filtering (CF) have seen the widest use. CF is based on the fact that

“word of mouth” opinions of other people have considerable influence on the buyers’

decision making [10]. If advisors have similar preferences to the buyer, he/she is much

more likely to be affected by their opinions. In CF-based recommendation schemes,

two approaches have mainly been developed: user-based approaches [4, 16, 18] and

item-based approaches [5, 17]. Usually, user-based and item-based CF systems involve

two steps. First, the neighbor group, which are users who have a similar preference to a

target user, called k nearest neighbors (for user-based CF) or the set of items that is

similar to a target item, called k most similar items (for an item-based CF), is


5

determined by using a variety of similarity computing methods, such as Pearson

correlation-based similarity, cosine-based similarity, and so on. This step is an

important task in CF-based recommendations because different neighbor users or items

lead to different recommendations [18]. Once the neighborhood is generated, in second

step, the prediction values of particular items, estimating how much the target user is

likely to prefer the items, are computed based on the group of neighbors. The more a

neighbor is similar to the target user or the target item, the more influence he/she or it

has for calculating a prediction value. After predicting how much the target user will

like particular items not previously rated by him/her, the top N item set, the set of

ordered items with the highest predicted values, is identified and recommended. The

target user can present feedback on whether he/she actually likes the recommend top N

items or how much he/she prefers those items as scaled ratings.

2.2 Social Tagging in Recommender Systems

Social tagging is the practice of allowing any user to freely annotate the content with

any kind of arbitrary keywords (i.e., tags) [6]. Social media sites with social tagging

have become tremendously popular in recent years. Therefore, the area of recommender

systems with social tagging (folksonomy) has become active and growing topic of

studies. These studies can be broadly divided into three topics: tag suggestions, social

searches, and social recommendations.

With the popularity of the usage of tags, many researchers have proposed new

applications for recommender systems supporting the suggestion of suitable tags during

folksonomy development. In [21], a tag recommender system with Flickr’s dataset is

presented based on an analysis of how users annotate photos and what information is

contained in the tagging. In [9], three classes of algorithms for tag recommendations,

such as an adaptation of user-based CF, a graph-based FolkRank [8] algorithm, and

simple methods based on tag counts, are presented and evaluated. Xu et al. [23] propose

an algorithm for collaborative tag suggestions that employs a reputation score for each

user based on the quality of the tags contributed by the user. In [14], a new semantic

tagging system, SemKey, is proposed in order to combine semantic technologies with

the collaborative tagging paradigm in a way that can be highly beneficial to both areas.


6

Differently from our aims, the purpose of these studies using social tagging is basically

to recommend appropriate tags for assisting the user in annotation related tasks. Our

approach takes a different stance. Rather than offering the tag recommendations, our

aim is to find like-minded users with tags’ semantics and identify personal resources

semantically relevant to user needs.

Research has also been very active in relating information retrieval using social

tagging. In [8], the authors presented a formal model and a new search algorithm for

folksonomies called FolkRank. The FolkRank is applied not only to find communities

within the folksonomy but also to recommend tags and resources. In [24], a social

ranking mechanism is proposed to answer a user’s query that aims to transparently

improve content searches based on emergent tags semantics. It exploits users’ similarity

and tags’ similarity based on their past tagging activity. In [19], the ContextMerge

algorithm is introduced to support efficiently user-centric searches in social networks,

dynamically including related users in the execution. The algorithm adopts two-

dimensional expansions: social expansion considers the strength of relations among

users and semantic expansion considers the relatedness of different tags. In [2], two

algorithms are proposed, SocialSimRank and SocialPageRank. The former algorithm

calculates the similarity between tags and user queries whereas the latter one captures

page popularity based on its annotations. All these works attempt to improve users’

searches by incorporating social annotations into query expansion. Differing from these

works, our goal is to automatically identify resources without users’ queries, which are

likely to fit their needs.

Other researchers have studied the same area as our study. In [10], authors proposed

collaborative filtering method via collaborative tagging. First they determine similarities

between users with social tags and subsequently identify the latent tags for each user to

recommend items via a naïve Bayes approach. Tso-Sutter et al. [22] proposed a generic

method that allows tags to be incorporated into CF algorithms by reducing the three-

dimensional correlations to three two-dimensional correlations and then applying a

fusion method to re-associate these correlations. Similar approach is presented by [25]

as well in order to provide improved recommendations to users. Although these studies

give reasonable promise of improving the performance, they do not take the semantics

of tags into consideration. Consequently, the lack of semantic information has


7

limitations, such as polysemy and synonymy of tags, for identifying similar users with

user-generated tags. We believe the semantic information of tags can be more helpful

not only to grasp better users’ interests but also to enhance the quality of

recommendations. The current literature recently focuses on semantic recommender

systems that are similar to our goal. Unlike our approach, however, the existing work on

the semantic recommender systems relies on a prefixed ontology and uses technologies

from Semantic Web. The state of the topic for the semantic recommender systems has

been well analyzed in [15].

3 IEML Models

For understanding our semantic approach, this section briefly explains preliminary

concepts of Information Economy MetaLanguage (IEML5) that will be exploited in the

next sections of this report. The IEML research program promotes a radical innovation

in the notation and processing of semantics. IEML is a regular language and a symbolic

system for the notation of meaning. It is “semantic content oriented” rather than

“instruction oriented” like programming languages or “format oriented” like data

standards. IEML provides new methods for semantic interoperability, semantic

navigation, collective categorization and self-referential collective intelligence [11].

IEML research program is compatible with the major standards of the Web of data and

is in tune with the current trends in social computing.

3.1 IEML Overview

IEML expressions are built from a syntactically regular combination of six symbols,

called primitives. In IEML a sequence is a succession of 3l single primitives, where l =

(0, 1, 2, 3, 4, 5, 6). l is called the layer of a sequence. For each layer, the sequences have

respectively a length of 1, 3, 9, 27, 81, 243, and 729 primitives [12]. From a syntactic

point of view, any IEML expression is nothing else than a set of sequences. As there is a

distinct semantic for each distinct sequence, there is also a distinct semantic for each

distinct set of sequences. In general, the meaning of a set of sequences corresponds to

5 http://ieml.org


8

the union of the meaning of the sequences of this set. The main result is that any

algebraic operation that can be made on sets in general can also be made on semantics

(significations) once they are expressed in IEML. An IEML dictionary provides the

correspondence between IEML sequences and a natural language descriptor of an IEML

expression. The terms of the dictionary belong to layers 0-3. There are rules to create

inflected words from these terms, to create sentences from inflected words and to create

relations between sentences by using some terms as conjunctions. Given these rules, it

is possible to express any network of relations between sentences by using sequences up

to layer 6. Various notation, syntax, semantics, and examples of IEML have been

presented in [12]. Due to a lack of space, we refer the reader to [12] for more details.

3.2 IEML Language Model

We present the model of the IEML language, along with the model of semantic

variables.

Let be a nonempty and finite set of symbols, = {S, B, T, U, A, E}. Let string s be

a finite sequence of symbols chosen from . The length of this string is denoted by |s|.

An empty string is a string with zero occurrence of symbols and its length is | |= 0.

The set of all strings of length k composed with symbols from is defined as k = {s

where |s| = k}. Note that 0 = {} and 1 = {S, B, T, U, A, E}. Although and 1 are

sets containing exactly the same members, the former contains symbols, and the latter

strings. The set of all strings over is defined as * = 0123 …

A useful operation on strings is concatenation, defined as follows. For all si =

a1a2a3a4…ai * and sj = b1b2b3b4…bj*, then sisj denotes string concatenation such

that sisj = a1a2a3a4…ai b1b2b3b4…bj and |sisj| = i + j. The IEML language over is a

subset of *, LIEML *:

}60 ,3|| |{ * lssL lIEML (1)


9

3.3 Model of Semantic Sequences

Definition 1 (Semantic sequence) a string s is called a semantic sequence if and only if

sLIEML.

To denote the pnth primitive of a sequence s, we use a superscript n where 1 n 3l and

write sn. Note that for any sequence s of layer l, sn is undefined for any n > 3l. Two

semantic sequences are distinct if and only if either of the following holds : i) their

layers are different, ii) they are composed from different primitives, iii) their primitives

do not follow the same order: for any sa and sb,

||||, banb

naba ssssnss (2)

Let’s now consider binary relations between semantic sequences in general. These

are obtained by performing a Cartesian product of two sets6. For any set of semantic

sequences X, Y where saX, sbY and using Equation 2, we define four binary relations

whole X × Y, substance X × Y , attribute X × Y, and mode X × Y as follows:

}|),{(whole baba ssss

|}| 1 |,|3|||),{(substance bbanb

naba snssssss

|}| 1 |,|3|||),{(attribute ||bba

nb

snaba snssssss b

|}| 1 |,|3|||),{(mode ||2bba

snb

naba snssssss b

(3)

Any two semantic sequences that are equal are in a whole relationship. In addition,

any two semantic sequences that share specific subsequences may be in substance,

attribute or mode relationship. For any two semantic sequences sa and sb, if they are in

one of the above relations, then we say that sb plays a role w.r.t sa and we call sb a seme

of sequence.

Definition 2 (seme of a sequence) For any semantic sequence sa and sb, if (sa, sb)

whole substanceattributemode, then sb plays a role w.r.t. sa and sb is called a seme.

We can now group distinct semantic sequences together into sets. A useful grouping

is based on the layer of those semantic sequences.

6 A Cartesian product of two sets X and Y is written as follows: X × Y = {(x, y) | xX, yY}.


10

3.4 Model of Semantic Categories and Catsets

A category of LIEML is a subset such that all strings of that subset have the same length:

|}||| ,{ jiIEMLji sswhereLssc (4)

Definition 3 (semantic category) A semantic category c is a set containing semantic

sequences at the same layer.

The layer of any category c is exactly the same as the layer of the semantic

sequences included in that category. The set of all categories of layer l is given as the

powerset7 of the set of all strings of layer l of LIEML:

Cl = Powerset({s LIEML where |s|=3l}) (5)

Two categories are distinct if and only if they differ by at least one element. For any

ca and cb:

abbaba cccccc (6)

A weaker condition can be applied to categories of distinct layers (since two

categories are different if their layers are different) and is written as:

baba ccclcl )()( (7)

where l(ca) and l(cb) denotes the layer of category ca and cb, respectively. Analogously

to sequences, we consider binary relations between any categories ci and cj where l(ci),

l(cj) 1. For any set of categories X, Y where ca X, cb Y, we define four binary

relations wholeC X Y, substanceC X Y, attributeC X Y, and modeC X Y as

follows:

}|),{(wholeC baba cccc

}substance),(,,|),{(substanceC babbaaba sscscscc

}attribute),(,,|),{(attributeC babbaaba sscscscc

}mode),(,,|),{(modeC babbaaba sscscscc

(8)

7 A powerset of S is the set of all subsets of S, including the empty set .


11

For any two categories ca and cb, if they are in one of the above relations (ca, cb)

wholeC substanceC attributeC modeC, then we say that cb plays a role with

respect to ca and cb is called a seme of category.

A catset is a set of distinct categories of the same layer as defined Definition 4.

Definition 4 (Catset) A catset is a set containing categories such that ={cn |i, j: ci

cj, l(ci)=l(cj)}.

The layer of a catset is given by the layer of any of its members: if some c , then

l() = l(c). Note that a category c can be written as c Cl, while a catset can be

written as Cl. All standard set operations, such as union and intersection, (e.g.,

and ), can be performed on catsets of the same layer.

3.5 Model of Uniform Semantic Locator

A USL is composed of up to seven catsets of different layers as follows:

Definition 5 (Uniform Semantic Locator, USL) A USL is a set containing catsets of

different layers such that = {n | i, j: l(ci) l(cj)}.

Note that since there are seven distinct layers, a USL can have at most seven

members. All standard set operations, such as union and intersection (e.g., and ) on

USLs are always performed on sets of categories (and therefore on sets of sequences),

layer by layer. Since at each layer l there is |Cl| distinct catsets, the whole semantic

space is defined by the tuple: Ł =C0 C1 C2 C3 C4 C5 C6.

In the IEML notation of USLs, the categories are separated by ‘‘/ ”. Table 1 shows an

example of USLs used as a *tag for “Wikipedia” and “XML” and a layer by layer

English translation of those USLs. A *tag holds the place of an IEML expression by

suggesting its meaning rather than uttering the IEML expression. The meaning of a *tag

has to be understood from the singular place that its corresponding IEML expression

occupies into the network of IEML semantic relations [12].


12

Table 1 An example of USLs in semantic space [12].

Tag: *Wikipedia

USL L0: (U: + S:) L1: (d.) / (t.) L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,)

/ (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,)

Semantics in English L0: knowledge networks L1: truth/ memory L2: get one’s bearings in knowledge / act for the

sake of society / synthesize / organized knowledge/ collective creation

L3: opening public space / encyclopedia L4: collective intelligence encyclopedia in

cyberspace / volunteers producing didactic material on any subject

Tag: *XML

USL L0: (A: + S:) L1: (b.) L2: (we.g.-) / (we.b.-) L3: (e.o.-we.h.-’) / (b.i.-b.i.-’) / (t.e.-d.u.-’) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-

A:.g.-’,_)

Semantics in English L0: document networks L1: language L2: unify the documentation / cultivate information

system L3: establishing norms and standards/ cyberspace /

meeting information needs L5: guarantee compatibility of data through same

formal structure

4 Social Recommender Systems on IEML Semantic Space

Fig. 1 System overview of social recommender systems based on IEML semantic tagging: a

user-based semantic collaborative filtering and an item-based semantic collaborative filtering

In this section, we present two social recommendation methods that are incorporated

with USLs. We explore two well-known approaches of collaborative filtering to our

study: a user-based approach and an item-based approach. Fig. 1 illustrates the overall

process of each method with two phases: a neighborhood formation phase and an item


13

recommendation phase. In the neighborhood formation phase, we first compute

similarities between users (for a user-based method) and between items (for an item-

based method) by utilizing USLs. Thereafter, we generate a set of semantically like-

minded users for each user and a set of semantically similar items for each item. Based

on the neighborhood, in the recommendation phase, we predict a social ranking of items

to decide which items to recommend.

4.1 Semantic Models in Bipartite Space

Fig. 2 Bipartite space of social tagging space and USL semantic space

The social tagging in our study is free-to-all allowing any user to annotate any item with

any tag [6]. Therefore, if there is a list of r users Ũ = {u1, u2, …, ur}, a list of m tags Ť =

{t1, t2, … , tm}, and a list of n items Ĩ = {i1, i2, …, in}, the social tagging, folksonomy F

is a tuple F = Ũ, Ť, Ĩ, Y where Y Ũ × Ť × Ĩ is a ternary relationship called tag

assignments [9]. More conceptually, the triplet of the tagging space can be represented

as three-dimensional data cube. Beyond the tagging space, in our study, there is another

space where the tags are connected to USLs according to their semantics. We call this

space the IEML semantic space as illustrated in Fig. 2. Note that a USL is composed

of catsets (Definition 5) where each catset consists of semantic categories c

(Definition 4). Therefore an extended formal definition of the folksonomy, called

semantic folksonomy, is defined as follows:


14

Definition 6 (Semantic Folksonomy) Let Ł = C0 C1 C2 C3 C4 C5 C6 be the

whole semantic space. A semantic folksonomy is a tuple SF = Ũ, Ť, Ĩ, Y, Ň where Ň is

a ternary relationship such as Ň Ũ × Ť × Ł.

From semantic folksonomies, we present a formal description of two models, a

semantic user model and a semantic item model, which are used in our social

recommender system.

4.1.1 Semantic Model for the User

From semantic folksonomies, a semantic user model is defined as follows:

Definition 7 (Semantic User Model) Given a user u Ũ, a formal description of a user

model for user u, Mu, follows: Mu=Ťu, Ňu, where Ťu = {(t, i) Ť × Ĩ | (u, t, i) Y } and

Ňu = {(t, ) Ť × Ł | (u, t, ) Ň }.

For clarity some definitions of the sets used in this work are introduced. We define,

for a given user u, the set of all tags that user u has used *uT := {t Ť | i Ĩ: (t, i)

Ťu}. Therefore, the set of all USLs of the user u can be defined as *uN := { Ł | t

*uT : (t, ) Ňu}. For a certain item h, we define the set of tags that the user u annotated

the item h huT := {t Ť | Ťu (t ×{h})}. Accordingly, the set of all USLs of user u for

the item h can be defined as huN := { Ł | t h

uT : (t, ) Ňu}. As stated previously,

all standard set operations, such as union and intersection on USLs, can always be

performed on sets of categories, layer by layer. Therefore, for a given user model Mu for

user u, with respect to semantic space, tags that represent user u has used to annotated a

certain item h, can be represented as the union of USLs huN :

6

0)(

l

hu

hu lUSLUSL , where h

unn Nclcl nhu clUSL

,,)()( (9)

Likewise, all tags that user u has used are represented as the union of USLs *uN :

6

0

*

1

* )(

l u

n

i

iuu lUSLUSLUSL , where *,,)(

* )(unn Nclcl nu clUSL

(10)


15

*uUSL

)0(*uUSL )3(*

uUSL )6(*uUSL

)(* lUSLu

C1 Cn-1 CnC2

Cn

……

……

Catset in item hat layer l

Item h

Semantic Category

huUSL

)0(huUSL )3(h

uUSL )6(huUSL

)(lUSLhu

User u tags

Fig. 3 Conceptual user models with respect to IEML semantic space

4.1.2 Semantic Model for the Item

From semantic folksonomies, a semantic item model is defined as follows:

Definition 8 (Semantic Item Model) Given an item i Ĩ, a formal description of a

semantic item model for item i, M(i), follows: M(i) = Ť(i), Ň(i), where Ť(i) = {(u, t)

Ũ ×Ť | (u, t, i) Y } and Ň(i) = {(t, ) Ť × Ł | (u, t, ) Ň }.

For clarity, we introduce some definitions of the sets from an item perspective. We

define, for a given item i, the set of all tags that all users have annotated item i, iT* := {t

Ť | u Ũ: (u, t) Ť(i)}. Therefore, the set of all USLs of the item i can be defined as

iN*:= { Ł | t iT*

: (t, ) Ň(i)}. For a certain user v, we define the set of tags that

the user v annotated the item i, ivT := {t Ť | Ť(i) ({v}× t)}. Accordingly, The set of

all USLs of user v for the item i can be defined as ivN := { Ł | t i

vT : (t, ) Ň(i)}.

As stated previously, all standard set operations, such as union and intersection on

USLs, can always be performed on sets of categories, layer by layer. Therefore, for a

given item model M(i) for item i, with respect to semantic space, tags that represent a

certain user v has used to annotated item i can be represented as the union of USLs, ivN :

6

0)(

l

iv

iv lUSLUSL , where i

vnn Nclcl niv clUSL

,,)()( (11)


16

Likewise, all tags that have been annotated item i are represented as the union of

USLs, iN*:

6

0 *1* )(

l

ir

u

iu

i lUSLUSLUSL , where inn Nclcl n

i clUSL*,,)(* )(

(12)

)0(*iUSL

iUSL*

)3(*iUSL )6(*

iUSL

)(* lUSLi

ivUSL

)0(ivUSL

)(lUSLiv

)3(ivUSL )6(i

vUSL

Fig. 4 Conceptual item models with respect to IEML semantic space

4.2 User-based Semantic Collaborative Filtering

In this section, we describe a social recommendation method based on the semantic user

model (Definition 7). The basic idea of a user-based semantic collaborative filtering

starts from assuming that a certain user is likely to prefer items that like-minded users

have annotated with tags which are similar to the tags he/she used. Therefore, we first

look into the set of like-minded users who have tagged a target item and then compute

how semantically similar they are to the target user, called a user-user semantic

similarity. Based on the semantically similar users, the semantic social ranking of the

item is computed to decide whether or not to recommend.

4.2.1 Generating Semantically Nearest Neighbors

One of the most important tasks in CF recommender systems is the neighborhood

formation to identify a set of users who have similar taste, often called k nearest

neighbors. Those users can be directly defined as a group of connected friends in social

networks such as followings in Twitter, connections in Twine, people in Delicious,

friends in Facebook, and so on. However, most users have insufficient connections with


17

their friends. In addition, finding like-minded users in current social network services

still relies on manually browsing networks of friends, or keywords searching. Thus, this

form of establishing neighbors becomes a time consuming and ineffective process if we

take into consideration huge amount of available people in the network [3].

As mentioned in Section 2.1, typical collaborative filtering methods adopt a variety

of similarity measurement to determine similar users automatically. However, it also

encounters serious limitations, namely sparsity problem [1, 10]. It is often the case that

there is no intersection between two users and hence the similarity is not computable at

all. Even when the computation of similarity is possible, it may not be very reliable

because insufficient information is processed. To deal with this problem recent studies

have determined similarities between users by using user-generated tags in terms of

users’ characteristics [10, 13, 20, 22, 25]. However, there still remain limitations that

should be treated such as noise, polysemy, and synonymy tags. To this end, our study

identifies the nearest neighbors by using Uniform Semantic Locators, USLs, of each

user for more valuable and personalized analyses.

We define semantically similar users as a group of users presenting interest

categories of IEML close to those of the target user. Semantic similarity between two

users, u and v, can be computed by the sum of layer similarities from layer 0 to layer 6.

Formally, the semantic user similarity measure is defined as:

6

0

),(),(l

l vusimULayervusemUSim (11)

where is a normalizing factor such that the layer weights sum to unity. simULayerl(u,

v) denotes the layer similarity of two users at layer l. The layer similarity measures

between two USL sets is defined as the size of the intersection divided by the size of the

union of the USL sets. In other words, it is determined by computing the weighted

Jaccard coefficient of two USL sets. Formally, the layer similarity is given by:

|)()(|

|)()(|

7

)1(),(

**

**

lUSLlUSL

lUSLlUSLlvusimULayer

vu

vul

(12)

where )(* lUSLu and )(* lUSLv

refer to the union of USLs for user u and v at layer l, 0 l

6, respectively. Here we give more layer weights at higher layer when computing the


18

semantic user similarity. That is, the intersections of higher layers present more

contribution than intersections of lower layers. When is set to 0.25 for normalization,

the similarity value between two users is in the range of 0 and 1. The higher score a user

has, the more similar he/she is to a target user. Finally, for a given user u Ũ, particular

k users with the highest semantic similarity are identified as semantically k nearest

neighbors such that:

),(maxarg)(}{\

~vusemUsimuSSN

k

uUvk

(13)

To illustrate a simple example for computing semantic user similarity, consider the

following three users, Alice, Bob, and Nami. Alice annotated Media1 with tags such as

“Community” and “XML”, Bob annotated Media2 with “Wikipedia” and “OWL” tags,

and Nami annotated Media3 with “Web of data” and “Folksonomy” tags. In addition,

consider USLs of each tag as shown in Fig. 5.

USLL0: (A: + B:)L1: (k.) / (m.)L2: (k.o.-) / (p.a.-)L3: (s.o.-a.a.-’)

*Community

USLL0: (A: + S:) L1: (b.) L2: (we.g.-) / (we.b.-) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (t.e.-d.u.-‘) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)

*XML

USLL0: (U: + S:)L1: (k.)L2: (s.x.-) / (x.j.-)L3: (e.o.-we.h.-’) / (b.i.-b.i.-’)L4:(t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)

/ (s.a.-t.a.-’ b.i.-l.i.-t.u.-’,)

*Web of data

USLL0: (A: + S:) L1: (b.) L2: (we.g.-) / (we.h.-) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (l.o.-u.u.-‘)L4: (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)

*OWL

USLL0: (A: + B:)L1: (k.)L2: (s.y.-) / (k.h.-) / (we.b.-)L3: (s.o.-a.a.-’) / (b.o.-o.o.-’) / (a.u.-we.h.-’)

/ (k.o.-a.a.-')

*Folksonomy

USLL0: (U: + S:)L1: (d.) / (t.)L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-)L3: (a.u.-we.h.-') / (n.o.-y.y.-s.y.-')L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,)

/ (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,)

*Wikipedia

Fig. 5 An example of tag assignments and USLs for computing the semantic user similarity

Now, compute the semantic similarity between Alice and Bob. Table 2 shows the

results of the union of the two USLs for “Community” and “XML” tags and for

“Wikipedia” and “OWL” tags, respectively.


19

Table 2 The union of USLs for Alice and Bob.

*AliceUSL : the union of Alice’s USLs *

BobUSL : the union of Bob’s USLs

L0: (A:+B:) / (A:+S:) L1: (k.) / (m.) / (b.) L2: (k.o.-) / (p.a.-) / (we.g.-) / (we.b.-) L3: (s.o.-a.a.-’) / (e.o.-we.h.-’) / (b.i.-b.i.-’) / (t.e.-

d.u.-’) L4: L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-

A:.g.-’,_)

L0: (U:+S:) / (A:+S:) L1: (d.) / ( t.) / (b.) L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) /

/ (we.g.-) / (we.h.-) L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) / (e.o.-we.h.-’) /

(b.i.-b.i.-’) / (l.o.-u.u.-’) L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) /

(u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’,) / (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)

L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)

From these two USLs set, we can compute the USL layer similarity, layer by layer.

For example, in the case of the layer 0, the size of the intersection of two USL sets is 1

(i.e., (A:+S:) which means documentary networks). And the size of the union of USL

sets is 3 (i.e., (A:+B:), (A:+S:), and (U:+S:)). The layer weight of the layer 0 would be

approximately 0.143 (i.e., 1/7). Consequently, the layer similarity at the layer 0 is

calculated as follows: simULayer0(Alice, Bob) = 1/7 1/3 = 0.048. In the case of the

later 1, the size of the intersection of two USL sets is 1 whereas the size of the union of

USL sets is 5. And thus, simULayer1(Alice, Bob) = 2/7 1/5 = 0.057. In a similar

fashion, it is possible to compute that simULayer2(Alice, Bob) = 0.043, that

simULayer3(Alice, Bob) = 0.163, that simULayer4(Alice, Bob) = 0, that

simULayer5(Alice, Bob) = 0.857, and that simULayer6(Alice, Bob) = 0. Finally, from the

layer 0 to the layer 6, the semantic similarity between Alice and Bob, semUSim(Alice,

Bob), is computed by the sum of each layer similarity as follows: semUSim(Alice, Bob)

= (0.048 + 0.057 + 0.043 + 0.143 + 0.857) = 0.25 1.168 = 0.292.

In the same way, the semantic similarity between Alice and Nami and between Bob

and Nami is calculated as semUSim(Alice, Nami) = 0.11 and semUSim(Bob, Nami) =

0.132, respectively. It means that Alice is semantically similar to Bob, rather than Nami.

4.2.2 Item Recommendation via Semantic Social Ranking

Once we have identified a group of semantically nearest neighbors, the final step is a

prediction, this is, attempting to speculate upon how a certain user would prefer unseen

items. In our study, the basic idea of uncovering relevant items starts from assuming


20

that a target user is likely to prefer items that have been tagged by semantically similar

users. Items tagged by similar user to the target user should be ranked higher. We label

this prediction strategy social ranking based on the semantic user model. Formally, the

semantic social ranking score of the target user u for the target item h, denoted as

SUR(u, h), is obtained as follows:

),(||

||),(

)(

*

vusemUSimUSL

USLUSLhuSUR

uSSNvhv

hvu

k

(14)

where SNNk(u) is a set of k nearest neighbors of user u grouped by the semantic user

similarity and hvUSL is the union of USLs connected to tags that user v has assigned item

h. semUSim(u, v) denotes the semantic similarity between user u and user v.

Once the ranking score of the target user for items, which have not previously been

tagged by him/her, are computed, the items are sorted in descending order of the score

SSR(u, h). Two strategies can then be used to select the relevant items to user u. First, if

the ranking scores are greater than a reasonable threshold, i.e., SUR(u, h) > , the items

are recommended to user u. Second, a set of top N ranked items that have obtained the

higher scores are identified for user u, and then, those items are recommended to user u.

L0: (A: + S:) L1: (b.) L2: (we.g.-) / (we.h.-) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (l.o.-u.u.-‘)L4: (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)

: USLs of Bob for Webpage 22w

BobUSL

: USLs of Nami for Webpage 2

L0: (U: + S:)L1: (k.)L2: (s.x.-) / (x.j.-)L3: (e.o.-we.h.-’) / (b.i.-b.i.-’)L4:(t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)

/ (s.a.-t.a.-’ b.i.-l.i.-t.u.-’,)

2wNamiUSL

L0: (A: + B:)L1: (k.)L2: (s.y.-) / (k.h.-) / (we.b.-)L3: (s.o.-a.a.-’) / (b.o.-o.o.-’) / (a.u.-we.h.-’)

/ (k.o.-a.a.-')

: USLs of Nami for Webpage 33w

NamiUSL

L0: (U: + S:)L1: (d.) / (t.)L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-)L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’)L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,)

/ (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,)

: USLs of Bob for Webpage 33w

BobUSL

L0: (A:+B:) / (A:+S:) L1: (k.) / (m.) / (b.) L2: (k.o.-) / (p.a.-) / (we.g.-) / (we.b.-) L3: (s.o.-a.a.-’)/ (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (t.e.-d.u.-' )L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)

: Union of Alice’s USLs*AliceUSL

6|| 2* MBobAlice USLUSL

9|| 2 MBobUSL

3|| 2* MNamiAlice USLUSL

8|| 2 MNamiUSL

0|| 3* MBobAlice USLUSL

12|| 3 MBobUSL

9|| 3 MNamiUSL

4|| 3* MNamiAlice USLUSL

Fig. 6 An example of computing the semantic social ranking of Alice for Media2 and Media3


21

Concentrate on the same situation as the example described in the previous section.

And assume that Alice is a target user to whom the system should recommend items and

that the neighbors of Alice are Bob and Nami. We want to answer the question: which

items should the system recommend to Alice? To provide the answer, we should

compute the semantic social ranking scores for items that Alice has not previously

tagged (e.g., Media2 and Media3). First, let us calculate the social ranking of Alice for

Media2. To this end, we should first compute a number of the intersection categories

between Alice and each of neighbors for Media2. In the case of Bob, he annotated

Media2 with tag “OWL”. As can be seen in Fig. 6, the number of intersection categories

is 6, i.e., 6|| 2* MBobAlice USLUSL , and the size of union of Bob’s USLs for Media2 is 9, i.e.,

9|| 2 MBobUSL . In the case of Nami who annotated Media2 with tag “Web of data”, the

number of intersection categories is 3, i.e., 3|| 2* MNamiAlice USLUSL , and the size of union of

Nami’s USLs for Media2 is 8, i.e., 8|| 2 MNamiUSL . Finally, the semantic ranking score of

Alice for Media2 can be computed by the weighted sum of the number of the interaction

categories using the semantic similarity as the weight:

236.08

11.03

9

292.06)2,(

MAliceSUR

Second, let us calculate the semantic ranking score of Alice for Media3. In this case,

there exist no intersections of Alice and Bob at all layers, i.e., 0|| 3* MBobAlice USLUSL . In the

case of Nami, a number of the intersection categories is 4, i.e., 4|| 3* MNamiAlice USLUSL .

Consequently, the ranking score for Media3 becomes SUR(Alice, M3)=0.049.

Considering two social media, Media2 and Media3, the system predicts that Media2

will be likely to fit Alice’s needs, rather than Media3.

4.3 Item-based Semantic Collaborative Filtering

In this section, we explain an item perspective of a social recommendation method by

using the semantic item model (Definition 8). With respect to an item-based

collaborative filtering, we first look into the set of similar items that the target user has

tagged and then compute how semantically similar they are to the target item, called a

semantic item-item similarity. Based on the semantically similar items, we recommend


22

relevant items to the target user through capturing how he/she annotated the similar

items.

4.3.1 Generating Semantically Similar Items

We define semantically similar items as a group of items that tagged categories of

IEML close to those of the target item. Semantic similarity between two items, i and j,

can be computed by the weighted sum of layer similarities from layer 0 to layer 6.

Formally, the semantic item similarity measure is defined as:

6

0

),(),(l

i jisimILayerjisemISim (15)

where is a normalizing factor such that the layer weights sum to unity. simILayerl(i, j)

denotes the layer similarity of two items at layer l. The layer similarity measures

between two USL sets is defined as the weighted Jaccard coefficient of two USL sets.

Formally, the layer similarity is given by:

|)()(|

|)()(|

7

)1(),(

**

**

lUSLlUSL

lUSLlUSLljisimILayer

ji

jil

(16)

where )(* lUSLi and )(* lUSLj refer to the union of USLs for item i and j at layer l, 0 l

6, respectively. Here we give more layer weights at higher layer when computing the

semantic item similarity. That is, the intersections of higher layers present more

contribution than intersections of lower layers. When is set to 0.25 for normalization,

the similarity value between two users is in the range of 0 and 1. The higher score an

item has, the more similar it is to a target item. Finally, for a given item i Ĩ, particular

k items with the highest semantic similarity are identified as semantically k most similar

items such that:

),(maxarg)(}{\

~jisemIsimiSSI

k

iIjk

(17)


23

4.3.2 Item Recommendation via Semantic Social Ranking

Once we have identified a group of semantically similar items, the final step is a

prediction, this is, attempting to speculate upon how a certain user would prefer unseen

items. In our study, the basic idea of discovering relevant items starts from assuming

that a target user is likely to prefer items which are semantically similar to items that

he/she has tagged before. We call this prediction strategy social ranking based on the

semantic item model. Formally, the prediction value of the target user u for the target

item i, denoted as SIR(u, i), is obtained as follows:

),(||

||),(

)(

* jisemISimUSL

USLUSLiuSIR

iSSIjju

ju

i

k

(17)

where SSIk(i) is a set of k most similar items of item i grouped by the semantic item

similarity and juUSL is the union of USLs connected to tags that user u has annotated

item j. semISim(i, j) denotes the semantic similarity between item i and item j.

Once the ranking score of the target user for items, which have not previously been

tagged by her, are computed, the items are sorted in descending order of the value

SIR(u, i). Finally, a set of top-N ranked items that have obtained the higher scores are

identified for user u, and then, those items are recommended to user u.

5 Experimental Evaluation

In this section, we present experimental evaluations of the proposed approach and

compare its performance against that of the benchmark algorithms.

5.1 Evaluation Design and Metrics

The experimental data comes from BibSonomy8 which is a collaborative tagging

application allowing users to organize and share scholarly references. The dataset used

in this study is the p-core at level 5 from BibSonomy [26]. The p-core of level 5 means

that each user, tag and item has/occurs in at least 5 posts [9]. The original dataset

8 http://bibsonomy.org


24

contains several useless tags and system tags, such as “r”, “!”, and “system:imported”,

and thus we cleaned those tags in the experiments. Table 3 briefly describes our dataset.

Table 3 Characteristic of Bibsonomy dataset (p-core at level 5)

| Ũ | | Ĩ | | Ť | | Ł | | Y | | Ň | # of posts

116 361 400 325 9,996 3,783 2,494

To evaluate the performance of the recommendations, we randomly divided the dataset

into a training set and a test set. For each user u we randomly selected 20% of items

which he had previously posted and subsequently used those as the test set. And 80% of

items which he had previously posted was used as the training set. To ensure that our

results are not sensitive to the particular training/test portioning for each user, we used a

5-fold cross validation scheme [7]. Therefore, the result values reported in the

experiment section are the averages over all five runs.

To measure the performance of the recommendations, we adopted precision and

recall, which can judge how relevant a set of ranked recommendations is for the user

[7]. Precision measures the ratio of items in a list of recommendations that were also

contained in the test set to number of items recommended. And recall measures the

ratio of in a list of recommendations that were also contained in the test set to the

overall number of items in test set. Precision and recall for a given user u in test set is

given:

|)(|

|)()(|)(

uTopN

uTopNuTestuprecision

,

|)(|

|)()(|)(

uTest

uTopNuTesturecall

(15)

where Test(u) is the item list of user u in the test set and TopN(u) is the top N

recommended item list for user u. Finally, the overall precision and recall for all users

in the test set is computed by averaging the personal precision(u) and recall(u).

However, precision and recall are often in conflict with each other. Generally, the

increment of the number of items recommended tends to decrease precision but it

increases recall [18]. Therefore, to consider both of them as the quality judgment of

recommendations we use the standard F1 metric, which combines precision and recall

into a single number:


25

recallprecision

recallprecisionF1

2 (16)

In order to compare the performance of our algorithm, a user-based CF algorithm,

where the similarity is computed by cosine-based similarity (denoted as UCF) [18], an

item-based CF algorithm [5], which employs cosine-based similarity (denoted as ICF),

and a most popular tags approach [9], which are based on tags that a user already used

(denoted as MPT), were implemented. The top N recommendation via the semantic

social ranking of the user-based method (denoted as SUR) and the item-based method

(denoted as SIR) were evaluated in comparison with the benchmark algorithms.

5.2 Experiment Results

In this section, we present detailed experimental results. The performance evaluation is

divided into two dimensions. The influence of the number of neighbors on the

performance is first investigated, and then the quality of item recommendations is

evaluated in comparison with the benchmark methods.

5.2.1 Experiments with Neighborhood Size

As noted in a number of previous studies, the size has significant impact on the

recommendation quality of neighborhood-based algorithms [5, 10, 17, 18]. Therefore,

we varied the neighborhood size k from 5 to 80. In the case of UCF and SUR, the

parameter k denotes the number of the nearest users whereas it is the number of the

most similar items for ICF and SIR. Even though MPT is not affected by the

neighborhood size at all, we also reported its result for the purpose of comparisons. In

this experiment, we set the number of recommended items N to 10 (i.e., top-10) for each

user in the test set.

Fig. 7 shows a graph of how precisions and recalls of four methods changes as the

neighborhood size grows. UCF elevate precision as the neighborhood size increases

from 5 to 20, after this value, it decreased slightly. With respect to recall, the curve of

the graph for UCF tends to be flat. In the case of ICF, precision and recall improved

until a neighbor size of 40; beyond this point, further increases of the size give negative


26

influence on performance. With respect to SIR, we observe that precision and recall

tend to improve slightly as k value is increased from 5 to 30 and to 40, respectively.

However, after this point, any further increases leads to worse results. For SUR, results

obtained look different. Unlike UCF, ICF, and SIR, It can be observed from the curve of

the graphs that SUR was quite affected by the size of the neighborhood. The two charts

for SUR reveal that increasing neighborhood size has detrimental effects on both

metrics. In other words, we found that SUR provides better precision and recall when

the neighborhood size is relatively small. For example, in terms of both precision and

recall, increasing neighborhood size much beyond 20 yielded rather worse results than

when the size was 5. This result indicates that superfluous users can negatively impact

the recommendation quality of our method.

Fig. 7 Precision and recall with respect to increasing the neighborhood size

We continued to examine F1 values in order to compare with each other and then to

select the best neighborhood size of each method. Fig. 8 depicts F1 variation of four

methods as the neighborhood size increases. SUR outperforms MPT and ICF at all

neighborhood size levels, whereas it provides improved performance over UCF as the

neighborhood size increases from 5 to 40; beyond this point, F1 was poorer for SUR

than for UCF. Examining the best value of each method, F1 of UCF is 0.107 (k=20), F1

of ICF is 0.099 (k=40), F1 of MPT is 0.0825, F1 of SUR is 0.119 (k=10), and F1 of SIR

is 0.114 (k=30). This result implies SUR can provide better performance than the other

methods even when data is sparse or available data for users is relatively insufficient. In

practice, CF recommender systems make a trade-off between recommendation quality

and real-time performance efficiency by pre-selecting a number of neighbors. In


27

consideration of both quality and computation cost, we selected 20, 40, 10, and 30 as

the neighborhood size of UCF, ICF, SUR, and SIR, respectively, in subsequent

experiments.

Fig. 8 F1 value with respect to increasing the neighborhood size

5.2.2 Comparisons with Other Methods

To experimentally evaluate the performance of top N recommendation, we calculated

precision, recall, and F1 obtained by UCF, ICF, MPT, SUR, and SIR according to the

variation of number of recommended items N from 2 to 10 with an increment of 2.

Since in practice users tend to click on items with higher ranks, we only examined a

small number of recommended items.

Fig. 9 depicts the precision-recall plot, showing how precisions and recalls of four

methods changes as N value increases. Data points on the graph curves refer to the

number of recommended items. That is, first point of each curve represents the case of

N=2 (i.e., top-2) whereas last point is the case of N=10 (i.e., top-10). As can be observed

from the graph, the curves of all methods tend to descend. This phenomenon implies

that the increment of the number of items recommended tends to decrease precision but

it increases recall. With respect to precision, in the case that N=2 and N=4, SIR

demonstrates the best performance. However, SUR demonstrates the best performance


28

as N is increased. With respect to recall, SUR outperforms the other four methods on all

occasions.

Fig. 9 Precision and recall as the value of the number of recommended items N increases

Fig. 10 Comparisons of F1 values as the number of recommended items N increases

Let us now focus on F1 results. Fig. 10 depicts the results of F1, showing how SUR

and SIR outperforms the other methods. As shown, both of the methods show

considerably improved performance compared to UCF, ICF, and MPT. For example,

SUR achieves 0.1%, 1.6%, and 1.5% improvement in the case of top-2 (N=2) whereas

SIR achieves 0.3%, 1.9% and 1.7% improvement, compared to UCF, ICF, and MPT,

respectively. When comparing the results achieved by SUR and SIR, The


29

recommendation quality of the former is superior to that of SIR as N is increased. In

terms of the five cases in average, SUR obtains 0.6%, 1.8%, 2.6%, and 0.3%

improvement compared to UCF, ICF, MPT, and SIR, respectively.

Fig. 11 Comparisons of precision, recall, and F1 for cold start users and active users

We further examined the recommendation performance for users who had few posts,

namely cold start users, and had lots of posts, namely active users, in the training set. A

CF-based recommender system is generally unable to make high quality

recommendations, compared to the case of active users, which is pointed out as one of

the limitations. We selectively considered two subsets of users who have less than 6

posts (21 users) and greater than 25 posts (21 users). And for two groups we calculated

precision, recall, and F1 within top-10 of ranked result set obtained by UCF, ICF,

MPT, and SUR. Fig. 11 shows those results for the cold start users (left) and active users

(right). As we can see from the graphs, the result demonstrated that F1 values of the

cold start group were considerably low compared to those for the active group. Such

results were caused by the fact that it was hard to analyze the users’ propensity for

postings because they did not have enough information (items or tags). Nevertheless,

comparing the results achieved by SUR and the benchmark algorithms, for the cold start

dataset, precision, recall, and F1 values of the former was found to be superior to those

of the other methods. For example, SUR obtains 11.9%, 14.3%, and 2.4% improvement

for recall compared to UCF, ICF, and MPT, respectively. In terms of F1, SUR

outperforms UCF, ICF and MPT by 2.2%, 2.6% and 0.4%, respectively. Only MPT

achieves comparable results. This result indicates that utilizing tagging information can

be helpful to alleviate the problem of the cold start users and thus to improve the quality


30

of item recommendations. With respect to the active dataset, it can be observed that

SUR provides better performance on all occasions than ICF and MPT. And comparing

F1 obtained by UCF and SUR, the difference appears insignificant in a comparative

fashion. Although precision of SUR is slightly worse than that of UCF in the active

dataset, notably the proposed method provides better quality than the benchmark

methods. That is, SUR can provide more suitable items not only to the cold start users

but also to the active users. Comparing results in the cold start and active dataset

achieved by MTP, interesting results were observed. Simple approach based on tags for

recommendations works well enough for the cold start users, compared to UCF and

ICF. However, in the other scenario, superfluous tags of users can include noise instead.

We conclude from these comparison experiments that our approaches can provide

consistently better quality of recommendations than the other methods. Furthermore, we

believe that the results of the proposed approach will become more practically

significant on large-scale Web 2.0 frameworks.

6 Concluding Remarks

For the future of the social Web, in this report, we have presented the semantic models

for the interoperability challenges that face semantic technology. We also proposed two

methods of collaborative filtering applied the semantic models and analyzed the

potential benefits of IEML to social recommender systems. As noted in our

experimental results, our methods can successfully enhance the performance of item

recommendations. Moreover, we also observed that our methods can provide more

suitable items for user interests, even when the number of recommended is small. The

main contributions of this study can be summarized as follows: 1) Our methods can

solve traditional stumbling blocks such as polysemy, synonymy, data sparseness, cold

start problem, semantic interoperability. 2) It can also offer trustworthy items

semantically relevant to a user’ needs because it becomes easier not only to catch

his/her preference but also to recommend to him/her by capturing semantics of user-

generated tags.


31

Acknowledgment

The work was mainly funded since 2009 by the Canada Research Chair in Collective

Intelligence at University of Ottawa.

References

1. Adomavicius, G., Tuzhilin, A. (2005) Toward the Next Generation of Recommender Systems: A

survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data

Engineering 17(6): 734-749

2. Bao, S., Wu, X., Fei, B., Xue, G., Su, Z., Yu, Y. (2007) Optimizing web search using social

annotations. In: Proceedings of the 16th International Conference on World Wide Web, pp. 501-510

3. Bonhard, P., Sasse, A. (2006) ‘Knowing me, knowing you’ - using profiles and social networking to

improve recommender systems. BT Technology Journal 24(3): 84-98

4. Breese, J. S., Heckerman, D., Kadie, C. (1998) Empirical analysis of predictive algorithms for

collaborative filtering. In: Proceedings of the Fourteenth Annual Conference on Uncertainty in

Artificial Intelligence, pp. 43–52

5. Deshpande, M., Karypis, G. (2004) Item-based Top-N Recommendation Algorithms. ACM

Transactions on Information Systems22(1): 143-177

6. Golder, S. A., Huberman, B. A. (2006) Usage patterns of collaborative tagging systems. Journal of

Information Science 32(2): 198-208

7. Herlocker, J. L., Konstan, J. A., Terveen, L. G., Riedl, J. T. (2004) Evaluating collaborative filtering

recommender systems. ACM Transactions on Information Systems 22(1): 5-53

8. Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. (2006) Information Retrieval in Folksonomies:

Search and ranking. In: Proceedings of the 3rd European Semantic Web Conference, pp. 411-426

9. Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G. (2008) Tag recommendations

in social bookmarking systems. AI Communications 21(4): 231-247

10. Kim, H.-N., Ji, A.-T., Ha, I., Jo, G.-S. (2009) Collaborative filtering based on collaborative tagging

for enhancing the quality of recommendation. Electronic Commerce Research and Applications, Doi:

10.1016/j.elerap.2009.08.004

11. Lévy, P. (2009) Toward a self-referential collective intelligence some philosophical background of

the IEML research program. In: Proceedings of 1st International Conference on Computational

Collective Intelligence - Semantic Web, Social Networks & Multiagent Systems, pp. 22-35

12. Lévy, P. (2010) From social computing to reflexive collective intelligence: The IEML research

program. Information Sciences 180(1): 71-94

13. Li, X., Guo, L., Zhao, Y. (2008) Tag-based social interest discovery. In: Proceedings of the 17th

International Conference on World Wide Web, pp. 675-684


32

14. Marchetti, A., Tesconi, M., Ronzano, F. (2007) SemKey: A semantic collaborative tagging system.

In: Proceedings of Tagging and Metadata for Social Information Organization Workshop in the 16th

International Conference on World Wide Web

15. Peis, E., Morales-del-Castillo, J. M., Delgado-López, J. A. (2008) Semantic recommender systems.

Analysis of the state of the topic. Hipertext.net number 6.

http://www.hipertext.net/english/pag1031.htm. Accessed 15 Dec 2009

16. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J. (1994) GroupLens: An open

architecture for collaborative filtering of netnews. In: Proceedings of the ACM 1994 Conference on

Computer Supported Cooperative Work, pp. 175–186

17. Sarwar, B., Karypis, G., Konstan, J., Reidl, J. (2001) Item-based collaborative filtering

recommendation algorithms. In: Proceedings of the Tenth International World Wide Web

Conference, pp. 285-295

18. Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2000) Analysis of recommendation algorithms for E-

commerce. In: Proceedings of ACM Conference on Electronic Commerce, pp. 158–167

19. Schenkel, R., Crecelius, T., Kacimi, M., Michel, S., Neumann, T., Parreira, J. X., Weikum, G. (2008)

Efficient top-k querying over social-tagging networks. In: Proceedings of the 31st Annual

International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.

523-530

20. Siersdorfer, S., Sizov, S. (2009) Social recommender systems for web 2.0 folksonomies. In:

Proceedings of the 20th ACM conference on Hypertext and hypermedia, pp. 261-270

21. Sigurbjörnsson, B., van Zwol, R. (2008) Flickr tag recommendation based on collective knowledge.

In: Proceedings of the 17th International Conference on World Wide Web, pp. 327-336

22. Tso-Sutter, K. H. L., Marinho, L. B., Thieme, L. S (2008) Tag-aware recommender systems by

fusion of collaborative filtering algorithms. In: Proceedings of the 2008 ACM symposium on Applied

computing, pp. 1995-1999

23. Xu, Z., Fu, Y., Mao, J., Su, D. (2006) Towards the Semantic Web: collaborative tag suggestions. In:

Proceedings of the Collaborative Web Tagging Workshop in the 15th International Conference on

the World Wide Web

24. Zanardi, V., Capra, L. (2008) Social Ranking: Uncovering relevant content using tag-based

recommender systems. In: Proceedings of the 2008 ACM conference on Recommender Systems, pp.

51-58

25. Zhang, Z.-K., Zhou, T., Zhang, Y.-C. (2010) Personalized recommendation via integrated diffusion

on user-item-tag tripartite graphs. Physica A: Statistical Mechanics and its Applications 389(1): 179-

186

26. Knowledge and Data Engineering Group (2007) University of Kassel: Benchmark Folksonomy Data

from BibSonomy, version of April 30th, 2007. http://www.kde.cs.uni-kassel.de/bibsonomy/dumps/.

Accessed 15 Dec 2009