Technical Report, March 2010
1
Social Recommender Systems
on IEML Semantic Space
Heung-Nam Kim, Andrew Roczniak, Pierre Lévy, Abdulmotaleb El-Saddik
Collective Intelligence Lab, University of Ottawa
4 March 2010
H. N. Kim, A. Roczniak, P. Lévy, A. EI-Saddik, “Social recommender systems on IEML semantic space,”
Collective Intelligence Lab, University of Ottawa, Technical Report, March 2010
Technical Report, March 2010
2
Social Recommender Systems
on IEML Semantic Space
Heung-Nam Kim1,2, Andrew Roczniak1, Pierre Lévy1, Abdulmotaleb El Saddik2
1 Collective Intelligence Lab, University of Ottawa 2Multimedia Communication Research Lab, University of Ottawa
Abstract
In this report, we present two social recommendation methods that are incorporated
with the semantics of the tags: a user-based semantic collaborative filtering and an item-
based semantic collaborative filtering. Social tagging is employed as an approach in
order to grasp and filter users’ preferences for items. In addition, we analyze potential
benefits of IEML models for social recommender systems in order to solve polysemy,
synonymy, and semantic interoperability problems, which are notable challenges in
information filtering. Experimental results show that our methods offer significant
advantages both in terms of improving the recommendation quality and in dealing with
polysemy, synonymy, and interoperability issues.
1 Introduction
The prevalence of social media sites bring considerable changes not only on people’s
life patterns but also on the generation and distribution of information. This social
phenomenon has transformed the masses, who were only information consumers via
mass media, to be producers of information. However, as rich information is shared
through social media sites, the sheer amount of information that has not been available
before is increasing exponentially with daily additions. As well as finding the most
attractive and relevant content, users struggle with a great challenge in information
overload. Recommender systems that have emerged in response to the above challenges
provide users with recommendations of items that are more likely to fit their needs [17].
Technical Report, March 2010
3
With the popularity of social tagging (also known as collaborative tagging or
folksonomies) a number of researchers have recently concentrated on recommender
systems with social tagging [10, 13, 20, 22, 25]. Because modern social media sites,
such as Flickr1, YouTube2, Twitter3, and Delicious4 allow users to freely annotate their
contents with any kind of descriptive words, also known as tags [6], the users tend to
use the descriptive tags to annotate the contents that they are interested in [13].
Recommender systems incorporated with tags can alleviate limitations of traditional
recommender systems, such as the sparsity problem and the cold start problem [1], and
thus the systems eventually provide promising possibilities to better generate
personalized recommendations. Although these studies obtain reasonable promise of
improving the performance, they do not take into consideration the semantics of tags
themselves. Consequently, the lack of semantic information suffers from fundamental
problems: polysemy and synonymy of the tags, as clearly discussed in [6]. Without the
semantics of the tags used by users, the systems cannot differentiate the various social
interests of the users from the same tags. Furthermore, they cannot provide semantic
interoperability that is a notable challenge in the cyberspace [11].
To address the discussed issues, we introduce a new concept to capture semantics of
user-generated tags. We then propose two social recommendation methods that are
incorporated with the semantics of the tags: a user-based semantic collaborative filtering
and an item-based semantic collaborative filtering. First, in the user-based method, we
determine similarities between users by utilizing users’ semantic-oriented tags,
collectively called Uniform Semantic Locator (USL) and subsequently identify
semantically like-minded users for each user. Finally, we recommend social items (e.g.,
text, picture, video) based on the social ranking of the items that are semantically
associated to tags that like-minded users annotate. Second, in the item-based method,
we determine similarities between items by utilizing USLs and identify semantically
similar items for each item. Finally, we recommend social items based on the
semantically similar items.
1 http://www.flickr.com 2 http://www.youtube.com 3 http://twitter.com 4 http://delicious.com
Technical Report, March 2010
4
The main contributions of this study toward social recommender systems can be
summarized as follows: 1) We present and formalize models for semantic-oriented
social tagging in dealing with the issues of polysemy, synonymy, and semantic
interoperability. We also illustrate how the models can be adapted and applied to
existing social tagging systems. 2) We propose the methods of social recommendations
in semantic space that aim to find semantically similar users/items and discover social
items semantically relevant to users’ needs.
The rest of this report is organized as follows: in next section we review concepts
related to collaborative filtering and provide recent studies applying social tagging to
recommender systems. In Section 3, we provide some models used in our study. We
then describe our semantic models for social recommender systems and provide a
detailed description of how the models are applied to item recommendations in Section
4. In Section 5, we present the effectiveness of our methods through experimental
evaluations. Finally, we summarize our work.
2 Related Work
In this section, we summarize previous studies and position our study with respect to
other related works in the area.
2.1 Collaborative Filtering
Following the proposal of GroupLens [16], automated recommendations based on
Collaborative Filtering (CF) have seen the widest use. CF is based on the fact that
“word of mouth” opinions of other people have considerable influence on the buyers’
decision making [10]. If advisors have similar preferences to the buyer, he/she is much
more likely to be affected by their opinions. In CF-based recommendation schemes,
two approaches have mainly been developed: user-based approaches [4, 16, 18] and
item-based approaches [5, 17]. Usually, user-based and item-based CF systems involve
two steps. First, the neighbor group, which are users who have a similar preference to a
target user, called k nearest neighbors (for user-based CF) or the set of items that is
similar to a target item, called k most similar items (for an item-based CF), is
Technical Report, March 2010
5
determined by using a variety of similarity computing methods, such as Pearson
correlation-based similarity, cosine-based similarity, and so on. This step is an
important task in CF-based recommendations because different neighbor users or items
lead to different recommendations [18]. Once the neighborhood is generated, in second
step, the prediction values of particular items, estimating how much the target user is
likely to prefer the items, are computed based on the group of neighbors. The more a
neighbor is similar to the target user or the target item, the more influence he/she or it
has for calculating a prediction value. After predicting how much the target user will
like particular items not previously rated by him/her, the top N item set, the set of
ordered items with the highest predicted values, is identified and recommended. The
target user can present feedback on whether he/she actually likes the recommend top N
items or how much he/she prefers those items as scaled ratings.
2.2 Social Tagging in Recommender Systems
Social tagging is the practice of allowing any user to freely annotate the content with
any kind of arbitrary keywords (i.e., tags) [6]. Social media sites with social tagging
have become tremendously popular in recent years. Therefore, the area of recommender
systems with social tagging (folksonomy) has become active and growing topic of
studies. These studies can be broadly divided into three topics: tag suggestions, social
searches, and social recommendations.
With the popularity of the usage of tags, many researchers have proposed new
applications for recommender systems supporting the suggestion of suitable tags during
folksonomy development. In [21], a tag recommender system with Flickr’s dataset is
presented based on an analysis of how users annotate photos and what information is
contained in the tagging. In [9], three classes of algorithms for tag recommendations,
such as an adaptation of user-based CF, a graph-based FolkRank [8] algorithm, and
simple methods based on tag counts, are presented and evaluated. Xu et al. [23] propose
an algorithm for collaborative tag suggestions that employs a reputation score for each
user based on the quality of the tags contributed by the user. In [14], a new semantic
tagging system, SemKey, is proposed in order to combine semantic technologies with
the collaborative tagging paradigm in a way that can be highly beneficial to both areas.
Technical Report, March 2010
6
Differently from our aims, the purpose of these studies using social tagging is basically
to recommend appropriate tags for assisting the user in annotation related tasks. Our
approach takes a different stance. Rather than offering the tag recommendations, our
aim is to find like-minded users with tags’ semantics and identify personal resources
semantically relevant to user needs.
Research has also been very active in relating information retrieval using social
tagging. In [8], the authors presented a formal model and a new search algorithm for
folksonomies called FolkRank. The FolkRank is applied not only to find communities
within the folksonomy but also to recommend tags and resources. In [24], a social
ranking mechanism is proposed to answer a user’s query that aims to transparently
improve content searches based on emergent tags semantics. It exploits users’ similarity
and tags’ similarity based on their past tagging activity. In [19], the ContextMerge
algorithm is introduced to support efficiently user-centric searches in social networks,
dynamically including related users in the execution. The algorithm adopts two-
dimensional expansions: social expansion considers the strength of relations among
users and semantic expansion considers the relatedness of different tags. In [2], two
algorithms are proposed, SocialSimRank and SocialPageRank. The former algorithm
calculates the similarity between tags and user queries whereas the latter one captures
page popularity based on its annotations. All these works attempt to improve users’
searches by incorporating social annotations into query expansion. Differing from these
works, our goal is to automatically identify resources without users’ queries, which are
likely to fit their needs.
Other researchers have studied the same area as our study. In [10], authors proposed
collaborative filtering method via collaborative tagging. First they determine similarities
between users with social tags and subsequently identify the latent tags for each user to
recommend items via a naïve Bayes approach. Tso-Sutter et al. [22] proposed a generic
method that allows tags to be incorporated into CF algorithms by reducing the three-
dimensional correlations to three two-dimensional correlations and then applying a
fusion method to re-associate these correlations. Similar approach is presented by [25]
as well in order to provide improved recommendations to users. Although these studies
give reasonable promise of improving the performance, they do not take the semantics
of tags into consideration. Consequently, the lack of semantic information has
Technical Report, March 2010
7
limitations, such as polysemy and synonymy of tags, for identifying similar users with
user-generated tags. We believe the semantic information of tags can be more helpful
not only to grasp better users’ interests but also to enhance the quality of
recommendations. The current literature recently focuses on semantic recommender
systems that are similar to our goal. Unlike our approach, however, the existing work on
the semantic recommender systems relies on a prefixed ontology and uses technologies
from Semantic Web. The state of the topic for the semantic recommender systems has
been well analyzed in [15].
3 IEML Models
For understanding our semantic approach, this section briefly explains preliminary
concepts of Information Economy MetaLanguage (IEML5) that will be exploited in the
next sections of this report. The IEML research program promotes a radical innovation
in the notation and processing of semantics. IEML is a regular language and a symbolic
system for the notation of meaning. It is “semantic content oriented” rather than
“instruction oriented” like programming languages or “format oriented” like data
standards. IEML provides new methods for semantic interoperability, semantic
navigation, collective categorization and self-referential collective intelligence [11].
IEML research program is compatible with the major standards of the Web of data and
is in tune with the current trends in social computing.
3.1 IEML Overview
IEML expressions are built from a syntactically regular combination of six symbols,
called primitives. In IEML a sequence is a succession of 3l single primitives, where l =
(0, 1, 2, 3, 4, 5, 6). l is called the layer of a sequence. For each layer, the sequences have
respectively a length of 1, 3, 9, 27, 81, 243, and 729 primitives [12]. From a syntactic
point of view, any IEML expression is nothing else than a set of sequences. As there is a
distinct semantic for each distinct sequence, there is also a distinct semantic for each
distinct set of sequences. In general, the meaning of a set of sequences corresponds to
5 http://ieml.org
Technical Report, March 2010
8
the union of the meaning of the sequences of this set. The main result is that any
algebraic operation that can be made on sets in general can also be made on semantics
(significations) once they are expressed in IEML. An IEML dictionary provides the
correspondence between IEML sequences and a natural language descriptor of an IEML
expression. The terms of the dictionary belong to layers 0-3. There are rules to create
inflected words from these terms, to create sentences from inflected words and to create
relations between sentences by using some terms as conjunctions. Given these rules, it
is possible to express any network of relations between sentences by using sequences up
to layer 6. Various notation, syntax, semantics, and examples of IEML have been
presented in [12]. Due to a lack of space, we refer the reader to [12] for more details.
3.2 IEML Language Model
We present the model of the IEML language, along with the model of semantic
variables.
Let be a nonempty and finite set of symbols, = {S, B, T, U, A, E}. Let string s be
a finite sequence of symbols chosen from . The length of this string is denoted by |s|.
An empty string is a string with zero occurrence of symbols and its length is | |= 0.
The set of all strings of length k composed with symbols from is defined as k = {s
where |s| = k}. Note that 0 = {} and 1 = {S, B, T, U, A, E}. Although and 1 are
sets containing exactly the same members, the former contains symbols, and the latter
strings. The set of all strings over is defined as * = 0123 …
A useful operation on strings is concatenation, defined as follows. For all si =
a1a2a3a4…ai * and sj = b1b2b3b4…bj*, then sisj denotes string concatenation such
that sisj = a1a2a3a4…ai b1b2b3b4…bj and |sisj| = i + j. The IEML language over is a
subset of *, LIEML *:
}60 ,3|| |{ * lssL lIEML (1)
Technical Report, March 2010
9
3.3 Model of Semantic Sequences
Definition 1 (Semantic sequence) a string s is called a semantic sequence if and only if
sLIEML.
To denote the pnth primitive of a sequence s, we use a superscript n where 1 n 3l and
write sn. Note that for any sequence s of layer l, sn is undefined for any n > 3l. Two
semantic sequences are distinct if and only if either of the following holds : i) their
layers are different, ii) they are composed from different primitives, iii) their primitives
do not follow the same order: for any sa and sb,
||||, banb
naba ssssnss (2)
Let’s now consider binary relations between semantic sequences in general. These
are obtained by performing a Cartesian product of two sets6. For any set of semantic
sequences X, Y where saX, sbY and using Equation 2, we define four binary relations
whole X × Y, substance X × Y , attribute X × Y, and mode X × Y as follows:
}|),{(whole baba ssss
|}| 1 |,|3|||),{(substance bbanb
naba snssssss
|}| 1 |,|3|||),{(attribute ||bba
nb
snaba snssssss b
|}| 1 |,|3|||),{(mode ||2bba
snb
naba snssssss b
(3)
Any two semantic sequences that are equal are in a whole relationship. In addition,
any two semantic sequences that share specific subsequences may be in substance,
attribute or mode relationship. For any two semantic sequences sa and sb, if they are in
one of the above relations, then we say that sb plays a role w.r.t sa and we call sb a seme
of sequence.
Definition 2 (seme of a sequence) For any semantic sequence sa and sb, if (sa, sb)
whole substanceattributemode, then sb plays a role w.r.t. sa and sb is called a seme.
We can now group distinct semantic sequences together into sets. A useful grouping
is based on the layer of those semantic sequences.
6 A Cartesian product of two sets X and Y is written as follows: X × Y = {(x, y) | xX, yY}.
Technical Report, March 2010
10
3.4 Model of Semantic Categories and Catsets
A category of LIEML is a subset such that all strings of that subset have the same length:
|}||| ,{ jiIEMLji sswhereLssc (4)
Definition 3 (semantic category) A semantic category c is a set containing semantic
sequences at the same layer.
The layer of any category c is exactly the same as the layer of the semantic
sequences included in that category. The set of all categories of layer l is given as the
powerset7 of the set of all strings of layer l of LIEML:
Cl = Powerset({s LIEML where |s|=3l}) (5)
Two categories are distinct if and only if they differ by at least one element. For any
ca and cb:
abbaba cccccc (6)
A weaker condition can be applied to categories of distinct layers (since two
categories are different if their layers are different) and is written as:
baba ccclcl )()( (7)
where l(ca) and l(cb) denotes the layer of category ca and cb, respectively. Analogously
to sequences, we consider binary relations between any categories ci and cj where l(ci),
l(cj) 1. For any set of categories X, Y where ca X, cb Y, we define four binary
relations wholeC X Y, substanceC X Y, attributeC X Y, and modeC X Y as
follows:
}|),{(wholeC baba cccc
}substance),(,,|),{(substanceC babbaaba sscscscc
}attribute),(,,|),{(attributeC babbaaba sscscscc
}mode),(,,|),{(modeC babbaaba sscscscc
(8)
7 A powerset of S is the set of all subsets of S, including the empty set .
Technical Report, March 2010
11
For any two categories ca and cb, if they are in one of the above relations (ca, cb)
wholeC substanceC attributeC modeC, then we say that cb plays a role with
respect to ca and cb is called a seme of category.
A catset is a set of distinct categories of the same layer as defined Definition 4.
Definition 4 (Catset) A catset is a set containing categories such that ={cn |i, j: ci
cj, l(ci)=l(cj)}.
The layer of a catset is given by the layer of any of its members: if some c , then
l() = l(c). Note that a category c can be written as c Cl, while a catset can be
written as Cl. All standard set operations, such as union and intersection, (e.g.,
and ), can be performed on catsets of the same layer.
3.5 Model of Uniform Semantic Locator
A USL is composed of up to seven catsets of different layers as follows:
Definition 5 (Uniform Semantic Locator, USL) A USL is a set containing catsets of
different layers such that = {n | i, j: l(ci) l(cj)}.
Note that since there are seven distinct layers, a USL can have at most seven
members. All standard set operations, such as union and intersection (e.g., and ) on
USLs are always performed on sets of categories (and therefore on sets of sequences),
layer by layer. Since at each layer l there is |Cl| distinct catsets, the whole semantic
space is defined by the tuple: Ł =C0 C1 C2 C3 C4 C5 C6.
In the IEML notation of USLs, the categories are separated by ‘‘/ ”. Table 1 shows an
example of USLs used as a *tag for “Wikipedia” and “XML” and a layer by layer
English translation of those USLs. A *tag holds the place of an IEML expression by
suggesting its meaning rather than uttering the IEML expression. The meaning of a *tag
has to be understood from the singular place that its corresponding IEML expression
occupies into the network of IEML semantic relations [12].
Technical Report, March 2010
12
Table 1 An example of USLs in semantic space [12].
Tag: *Wikipedia
USL L0: (U: + S:) L1: (d.) / (t.) L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,)
/ (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,)
Semantics in English L0: knowledge networks L1: truth/ memory L2: get one’s bearings in knowledge / act for the
sake of society / synthesize / organized knowledge/ collective creation
L3: opening public space / encyclopedia L4: collective intelligence encyclopedia in
cyberspace / volunteers producing didactic material on any subject
Tag: *XML
USL L0: (A: + S:) L1: (b.) L2: (we.g.-) / (we.b.-) L3: (e.o.-we.h.-’) / (b.i.-b.i.-’) / (t.e.-d.u.-’) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-
A:.g.-’,_)
Semantics in English L0: document networks L1: language L2: unify the documentation / cultivate information
system L3: establishing norms and standards/ cyberspace /
meeting information needs L5: guarantee compatibility of data through same
formal structure
4 Social Recommender Systems on IEML Semantic Space
Fig. 1 System overview of social recommender systems based on IEML semantic tagging: a
user-based semantic collaborative filtering and an item-based semantic collaborative filtering
In this section, we present two social recommendation methods that are incorporated
with USLs. We explore two well-known approaches of collaborative filtering to our
study: a user-based approach and an item-based approach. Fig. 1 illustrates the overall
process of each method with two phases: a neighborhood formation phase and an item
Technical Report, March 2010
13
recommendation phase. In the neighborhood formation phase, we first compute
similarities between users (for a user-based method) and between items (for an item-
based method) by utilizing USLs. Thereafter, we generate a set of semantically like-
minded users for each user and a set of semantically similar items for each item. Based
on the neighborhood, in the recommendation phase, we predict a social ranking of items
to decide which items to recommend.
4.1 Semantic Models in Bipartite Space
Fig. 2 Bipartite space of social tagging space and USL semantic space
The social tagging in our study is free-to-all allowing any user to annotate any item with
any tag [6]. Therefore, if there is a list of r users Ũ = {u1, u2, …, ur}, a list of m tags Ť =
{t1, t2, … , tm}, and a list of n items Ĩ = {i1, i2, …, in}, the social tagging, folksonomy F
is a tuple F = Ũ, Ť, Ĩ, Y where Y Ũ × Ť × Ĩ is a ternary relationship called tag
assignments [9]. More conceptually, the triplet of the tagging space can be represented
as three-dimensional data cube. Beyond the tagging space, in our study, there is another
space where the tags are connected to USLs according to their semantics. We call this
space the IEML semantic space as illustrated in Fig. 2. Note that a USL is composed
of catsets (Definition 5) where each catset consists of semantic categories c
(Definition 4). Therefore an extended formal definition of the folksonomy, called
semantic folksonomy, is defined as follows:
Technical Report, March 2010
14
Definition 6 (Semantic Folksonomy) Let Ł = C0 C1 C2 C3 C4 C5 C6 be the
whole semantic space. A semantic folksonomy is a tuple SF = Ũ, Ť, Ĩ, Y, Ň where Ň is
a ternary relationship such as Ň Ũ × Ť × Ł.
From semantic folksonomies, we present a formal description of two models, a
semantic user model and a semantic item model, which are used in our social
recommender system.
4.1.1 Semantic Model for the User
From semantic folksonomies, a semantic user model is defined as follows:
Definition 7 (Semantic User Model) Given a user u Ũ, a formal description of a user
model for user u, Mu, follows: Mu=Ťu, Ňu, where Ťu = {(t, i) Ť × Ĩ | (u, t, i) Y } and
Ňu = {(t, ) Ť × Ł | (u, t, ) Ň }.
For clarity some definitions of the sets used in this work are introduced. We define,
for a given user u, the set of all tags that user u has used *uT := {t Ť | i Ĩ: (t, i)
Ťu}. Therefore, the set of all USLs of the user u can be defined as *uN := { Ł | t
*uT : (t, ) Ňu}. For a certain item h, we define the set of tags that the user u annotated
the item h huT := {t Ť | Ťu (t ×{h})}. Accordingly, the set of all USLs of user u for
the item h can be defined as huN := { Ł | t h
uT : (t, ) Ňu}. As stated previously,
all standard set operations, such as union and intersection on USLs, can always be
performed on sets of categories, layer by layer. Therefore, for a given user model Mu for
user u, with respect to semantic space, tags that represent user u has used to annotated a
certain item h, can be represented as the union of USLs huN :
6
0)(
l
hu
hu lUSLUSL , where h
unn Nclcl nhu clUSL
,,)()( (9)
Likewise, all tags that user u has used are represented as the union of USLs *uN :
6
0
*
1
* )(
l u
n
i
iuu lUSLUSLUSL , where *,,)(
* )(unn Nclcl nu clUSL
(10)
Technical Report, March 2010
15
*uUSL
)0(*uUSL )3(*
uUSL )6(*uUSL
)(* lUSLu
C1 Cn-1 CnC2
Cn
……
……
Catset in item hat layer l
Item h
Semantic Category
huUSL
)0(huUSL )3(h
uUSL )6(huUSL
)(lUSLhu
User u tags
Fig. 3 Conceptual user models with respect to IEML semantic space
4.1.2 Semantic Model for the Item
From semantic folksonomies, a semantic item model is defined as follows:
Definition 8 (Semantic Item Model) Given an item i Ĩ, a formal description of a
semantic item model for item i, M(i), follows: M(i) = Ť(i), Ň(i), where Ť(i) = {(u, t)
Ũ ׍ | (u, t, i) Y } and Ň(i) = {(t, ) Ť × Ł | (u, t, ) Ň }.
For clarity, we introduce some definitions of the sets from an item perspective. We
define, for a given item i, the set of all tags that all users have annotated item i, iT* := {t
Ť | u Ũ: (u, t) Ť(i)}. Therefore, the set of all USLs of the item i can be defined as
iN*:= { Ł | t iT*
: (t, ) Ň(i)}. For a certain user v, we define the set of tags that
the user v annotated the item i, ivT := {t Ť | Ť(i) ({v}× t)}. Accordingly, The set of
all USLs of user v for the item i can be defined as ivN := { Ł | t i
vT : (t, ) Ň(i)}.
As stated previously, all standard set operations, such as union and intersection on
USLs, can always be performed on sets of categories, layer by layer. Therefore, for a
given item model M(i) for item i, with respect to semantic space, tags that represent a
certain user v has used to annotated item i can be represented as the union of USLs, ivN :
6
0)(
l
iv
iv lUSLUSL , where i
vnn Nclcl niv clUSL
,,)()( (11)
Technical Report, March 2010
16
Likewise, all tags that have been annotated item i are represented as the union of
USLs, iN*:
6
0 *1* )(
l
ir
u
iu
i lUSLUSLUSL , where inn Nclcl n
i clUSL*,,)(* )(
(12)
)0(*iUSL
iUSL*
)3(*iUSL )6(*
iUSL
)(* lUSLi
ivUSL
)0(ivUSL
)(lUSLiv
)3(ivUSL )6(i
vUSL
Fig. 4 Conceptual item models with respect to IEML semantic space
4.2 User-based Semantic Collaborative Filtering
In this section, we describe a social recommendation method based on the semantic user
model (Definition 7). The basic idea of a user-based semantic collaborative filtering
starts from assuming that a certain user is likely to prefer items that like-minded users
have annotated with tags which are similar to the tags he/she used. Therefore, we first
look into the set of like-minded users who have tagged a target item and then compute
how semantically similar they are to the target user, called a user-user semantic
similarity. Based on the semantically similar users, the semantic social ranking of the
item is computed to decide whether or not to recommend.
4.2.1 Generating Semantically Nearest Neighbors
One of the most important tasks in CF recommender systems is the neighborhood
formation to identify a set of users who have similar taste, often called k nearest
neighbors. Those users can be directly defined as a group of connected friends in social
networks such as followings in Twitter, connections in Twine, people in Delicious,
friends in Facebook, and so on. However, most users have insufficient connections with
Technical Report, March 2010
17
their friends. In addition, finding like-minded users in current social network services
still relies on manually browsing networks of friends, or keywords searching. Thus, this
form of establishing neighbors becomes a time consuming and ineffective process if we
take into consideration huge amount of available people in the network [3].
As mentioned in Section 2.1, typical collaborative filtering methods adopt a variety
of similarity measurement to determine similar users automatically. However, it also
encounters serious limitations, namely sparsity problem [1, 10]. It is often the case that
there is no intersection between two users and hence the similarity is not computable at
all. Even when the computation of similarity is possible, it may not be very reliable
because insufficient information is processed. To deal with this problem recent studies
have determined similarities between users by using user-generated tags in terms of
users’ characteristics [10, 13, 20, 22, 25]. However, there still remain limitations that
should be treated such as noise, polysemy, and synonymy tags. To this end, our study
identifies the nearest neighbors by using Uniform Semantic Locators, USLs, of each
user for more valuable and personalized analyses.
We define semantically similar users as a group of users presenting interest
categories of IEML close to those of the target user. Semantic similarity between two
users, u and v, can be computed by the sum of layer similarities from layer 0 to layer 6.
Formally, the semantic user similarity measure is defined as:
6
0
),(),(l
l vusimULayervusemUSim (11)
where is a normalizing factor such that the layer weights sum to unity. simULayerl(u,
v) denotes the layer similarity of two users at layer l. The layer similarity measures
between two USL sets is defined as the size of the intersection divided by the size of the
union of the USL sets. In other words, it is determined by computing the weighted
Jaccard coefficient of two USL sets. Formally, the layer similarity is given by:
|)()(|
|)()(|
7
)1(),(
**
**
lUSLlUSL
lUSLlUSLlvusimULayer
vu
vul
(12)
where )(* lUSLu and )(* lUSLv
refer to the union of USLs for user u and v at layer l, 0 l
6, respectively. Here we give more layer weights at higher layer when computing the
Technical Report, March 2010
18
semantic user similarity. That is, the intersections of higher layers present more
contribution than intersections of lower layers. When is set to 0.25 for normalization,
the similarity value between two users is in the range of 0 and 1. The higher score a user
has, the more similar he/she is to a target user. Finally, for a given user u Ũ, particular
k users with the highest semantic similarity are identified as semantically k nearest
neighbors such that:
),(maxarg)(}{\
~vusemUsimuSSN
k
uUvk
(13)
To illustrate a simple example for computing semantic user similarity, consider the
following three users, Alice, Bob, and Nami. Alice annotated Media1 with tags such as
“Community” and “XML”, Bob annotated Media2 with “Wikipedia” and “OWL” tags,
and Nami annotated Media3 with “Web of data” and “Folksonomy” tags. In addition,
consider USLs of each tag as shown in Fig. 5.
USLL0: (A: + B:)L1: (k.) / (m.)L2: (k.o.-) / (p.a.-)L3: (s.o.-a.a.-’)
*Community
USLL0: (A: + S:) L1: (b.) L2: (we.g.-) / (we.b.-) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (t.e.-d.u.-‘) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)
*XML
USLL0: (U: + S:)L1: (k.)L2: (s.x.-) / (x.j.-)L3: (e.o.-we.h.-’) / (b.i.-b.i.-’)L4:(t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)
/ (s.a.-t.a.-’ b.i.-l.i.-t.u.-’,)
*Web of data
USLL0: (A: + S:) L1: (b.) L2: (we.g.-) / (we.h.-) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (l.o.-u.u.-‘)L4: (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)
*OWL
USLL0: (A: + B:)L1: (k.)L2: (s.y.-) / (k.h.-) / (we.b.-)L3: (s.o.-a.a.-’) / (b.o.-o.o.-’) / (a.u.-we.h.-’)
/ (k.o.-a.a.-')
*Folksonomy
USLL0: (U: + S:)L1: (d.) / (t.)L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-)L3: (a.u.-we.h.-') / (n.o.-y.y.-s.y.-')L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,)
/ (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,)
*Wikipedia
Fig. 5 An example of tag assignments and USLs for computing the semantic user similarity
Now, compute the semantic similarity between Alice and Bob. Table 2 shows the
results of the union of the two USLs for “Community” and “XML” tags and for
“Wikipedia” and “OWL” tags, respectively.
Technical Report, March 2010
19
Table 2 The union of USLs for Alice and Bob.
*AliceUSL : the union of Alice’s USLs *
BobUSL : the union of Bob’s USLs
L0: (A:+B:) / (A:+S:) L1: (k.) / (m.) / (b.) L2: (k.o.-) / (p.a.-) / (we.g.-) / (we.b.-) L3: (s.o.-a.a.-’) / (e.o.-we.h.-’) / (b.i.-b.i.-’) / (t.e.-
d.u.-’) L4: L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-
A:.g.-’,_)
L0: (U:+S:) / (A:+S:) L1: (d.) / ( t.) / (b.) L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) /
/ (we.g.-) / (we.h.-) L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) / (e.o.-we.h.-’) /
(b.i.-b.i.-’) / (l.o.-u.u.-’) L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) /
(u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’,) / (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)
L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)
From these two USLs set, we can compute the USL layer similarity, layer by layer.
For example, in the case of the layer 0, the size of the intersection of two USL sets is 1
(i.e., (A:+S:) which means documentary networks). And the size of the union of USL
sets is 3 (i.e., (A:+B:), (A:+S:), and (U:+S:)). The layer weight of the layer 0 would be
approximately 0.143 (i.e., 1/7). Consequently, the layer similarity at the layer 0 is
calculated as follows: simULayer0(Alice, Bob) = 1/7 1/3 = 0.048. In the case of the
later 1, the size of the intersection of two USL sets is 1 whereas the size of the union of
USL sets is 5. And thus, simULayer1(Alice, Bob) = 2/7 1/5 = 0.057. In a similar
fashion, it is possible to compute that simULayer2(Alice, Bob) = 0.043, that
simULayer3(Alice, Bob) = 0.163, that simULayer4(Alice, Bob) = 0, that
simULayer5(Alice, Bob) = 0.857, and that simULayer6(Alice, Bob) = 0. Finally, from the
layer 0 to the layer 6, the semantic similarity between Alice and Bob, semUSim(Alice,
Bob), is computed by the sum of each layer similarity as follows: semUSim(Alice, Bob)
= (0.048 + 0.057 + 0.043 + 0.143 + 0.857) = 0.25 1.168 = 0.292.
In the same way, the semantic similarity between Alice and Nami and between Bob
and Nami is calculated as semUSim(Alice, Nami) = 0.11 and semUSim(Bob, Nami) =
0.132, respectively. It means that Alice is semantically similar to Bob, rather than Nami.
4.2.2 Item Recommendation via Semantic Social Ranking
Once we have identified a group of semantically nearest neighbors, the final step is a
prediction, this is, attempting to speculate upon how a certain user would prefer unseen
items. In our study, the basic idea of uncovering relevant items starts from assuming
Technical Report, March 2010
20
that a target user is likely to prefer items that have been tagged by semantically similar
users. Items tagged by similar user to the target user should be ranked higher. We label
this prediction strategy social ranking based on the semantic user model. Formally, the
semantic social ranking score of the target user u for the target item h, denoted as
SUR(u, h), is obtained as follows:
),(||
||),(
)(
*
vusemUSimUSL
USLUSLhuSUR
uSSNvhv
hvu
k
(14)
where SNNk(u) is a set of k nearest neighbors of user u grouped by the semantic user
similarity and hvUSL is the union of USLs connected to tags that user v has assigned item
h. semUSim(u, v) denotes the semantic similarity between user u and user v.
Once the ranking score of the target user for items, which have not previously been
tagged by him/her, are computed, the items are sorted in descending order of the score
SSR(u, h). Two strategies can then be used to select the relevant items to user u. First, if
the ranking scores are greater than a reasonable threshold, i.e., SUR(u, h) > , the items
are recommended to user u. Second, a set of top N ranked items that have obtained the
higher scores are identified for user u, and then, those items are recommended to user u.
L0: (A: + S:) L1: (b.) L2: (we.g.-) / (we.h.-) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (l.o.-u.u.-‘)L4: (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)
: USLs of Bob for Webpage 22w
BobUSL
: USLs of Nami for Webpage 2
L0: (U: + S:)L1: (k.)L2: (s.x.-) / (x.j.-)L3: (e.o.-we.h.-’) / (b.i.-b.i.-’)L4:(t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)
/ (s.a.-t.a.-’ b.i.-l.i.-t.u.-’,)
2wNamiUSL
L0: (A: + B:)L1: (k.)L2: (s.y.-) / (k.h.-) / (we.b.-)L3: (s.o.-a.a.-’) / (b.o.-o.o.-’) / (a.u.-we.h.-’)
/ (k.o.-a.a.-')
: USLs of Nami for Webpage 33w
NamiUSL
L0: (U: + S:)L1: (d.) / (t.)L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-)L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’)L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,)
/ (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,)
: USLs of Bob for Webpage 33w
BobUSL
L0: (A:+B:) / (A:+S:) L1: (k.) / (m.) / (b.) L2: (k.o.-) / (p.a.-) / (we.g.-) / (we.b.-) L3: (s.o.-a.a.-’)/ (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (t.e.-d.u.-' )L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)
: Union of Alice’s USLs*AliceUSL
6|| 2* MBobAlice USLUSL
9|| 2 MBobUSL
3|| 2* MNamiAlice USLUSL
8|| 2 MNamiUSL
0|| 3* MBobAlice USLUSL
12|| 3 MBobUSL
9|| 3 MNamiUSL
4|| 3* MNamiAlice USLUSL
Fig. 6 An example of computing the semantic social ranking of Alice for Media2 and Media3
Technical Report, March 2010
21
Concentrate on the same situation as the example described in the previous section.
And assume that Alice is a target user to whom the system should recommend items and
that the neighbors of Alice are Bob and Nami. We want to answer the question: which
items should the system recommend to Alice? To provide the answer, we should
compute the semantic social ranking scores for items that Alice has not previously
tagged (e.g., Media2 and Media3). First, let us calculate the social ranking of Alice for
Media2. To this end, we should first compute a number of the intersection categories
between Alice and each of neighbors for Media2. In the case of Bob, he annotated
Media2 with tag “OWL”. As can be seen in Fig. 6, the number of intersection categories
is 6, i.e., 6|| 2* MBobAlice USLUSL , and the size of union of Bob’s USLs for Media2 is 9, i.e.,
9|| 2 MBobUSL . In the case of Nami who annotated Media2 with tag “Web of data”, the
number of intersection categories is 3, i.e., 3|| 2* MNamiAlice USLUSL , and the size of union of
Nami’s USLs for Media2 is 8, i.e., 8|| 2 MNamiUSL . Finally, the semantic ranking score of
Alice for Media2 can be computed by the weighted sum of the number of the interaction
categories using the semantic similarity as the weight:
236.08
11.03
9
292.06)2,(
MAliceSUR
Second, let us calculate the semantic ranking score of Alice for Media3. In this case,
there exist no intersections of Alice and Bob at all layers, i.e., 0|| 3* MBobAlice USLUSL . In the
case of Nami, a number of the intersection categories is 4, i.e., 4|| 3* MNamiAlice USLUSL .
Consequently, the ranking score for Media3 becomes SUR(Alice, M3)=0.049.
Considering two social media, Media2 and Media3, the system predicts that Media2
will be likely to fit Alice’s needs, rather than Media3.
4.3 Item-based Semantic Collaborative Filtering
In this section, we explain an item perspective of a social recommendation method by
using the semantic item model (Definition 8). With respect to an item-based
collaborative filtering, we first look into the set of similar items that the target user has
tagged and then compute how semantically similar they are to the target item, called a
semantic item-item similarity. Based on the semantically similar items, we recommend
Technical Report, March 2010
22
relevant items to the target user through capturing how he/she annotated the similar
items.
4.3.1 Generating Semantically Similar Items
We define semantically similar items as a group of items that tagged categories of
IEML close to those of the target item. Semantic similarity between two items, i and j,
can be computed by the weighted sum of layer similarities from layer 0 to layer 6.
Formally, the semantic item similarity measure is defined as:
6
0
),(),(l
i jisimILayerjisemISim (15)
where is a normalizing factor such that the layer weights sum to unity. simILayerl(i, j)
denotes the layer similarity of two items at layer l. The layer similarity measures
between two USL sets is defined as the weighted Jaccard coefficient of two USL sets.
Formally, the layer similarity is given by:
|)()(|
|)()(|
7
)1(),(
**
**
lUSLlUSL
lUSLlUSLljisimILayer
ji
jil
(16)
where )(* lUSLi and )(* lUSLj refer to the union of USLs for item i and j at layer l, 0 l
6, respectively. Here we give more layer weights at higher layer when computing the
semantic item similarity. That is, the intersections of higher layers present more
contribution than intersections of lower layers. When is set to 0.25 for normalization,
the similarity value between two users is in the range of 0 and 1. The higher score an
item has, the more similar it is to a target item. Finally, for a given item i Ĩ, particular
k items with the highest semantic similarity are identified as semantically k most similar
items such that:
),(maxarg)(}{\
~jisemIsimiSSI
k
iIjk
(17)
Technical Report, March 2010
23
4.3.2 Item Recommendation via Semantic Social Ranking
Once we have identified a group of semantically similar items, the final step is a
prediction, this is, attempting to speculate upon how a certain user would prefer unseen
items. In our study, the basic idea of discovering relevant items starts from assuming
that a target user is likely to prefer items which are semantically similar to items that
he/she has tagged before. We call this prediction strategy social ranking based on the
semantic item model. Formally, the prediction value of the target user u for the target
item i, denoted as SIR(u, i), is obtained as follows:
),(||
||),(
)(
* jisemISimUSL
USLUSLiuSIR
iSSIjju
ju
i
k
(17)
where SSIk(i) is a set of k most similar items of item i grouped by the semantic item
similarity and juUSL is the union of USLs connected to tags that user u has annotated
item j. semISim(i, j) denotes the semantic similarity between item i and item j.
Once the ranking score of the target user for items, which have not previously been
tagged by her, are computed, the items are sorted in descending order of the value
SIR(u, i). Finally, a set of top-N ranked items that have obtained the higher scores are
identified for user u, and then, those items are recommended to user u.
5 Experimental Evaluation
In this section, we present experimental evaluations of the proposed approach and
compare its performance against that of the benchmark algorithms.
5.1 Evaluation Design and Metrics
The experimental data comes from BibSonomy8 which is a collaborative tagging
application allowing users to organize and share scholarly references. The dataset used
in this study is the p-core at level 5 from BibSonomy [26]. The p-core of level 5 means
that each user, tag and item has/occurs in at least 5 posts [9]. The original dataset
8 http://bibsonomy.org
Technical Report, March 2010
24
contains several useless tags and system tags, such as “r”, “!”, and “system:imported”,
and thus we cleaned those tags in the experiments. Table 3 briefly describes our dataset.
Table 3 Characteristic of Bibsonomy dataset (p-core at level 5)
| Ũ | | Ĩ | | Ť | | Ł | | Y | | Ň | # of posts
116 361 400 325 9,996 3,783 2,494
To evaluate the performance of the recommendations, we randomly divided the dataset
into a training set and a test set. For each user u we randomly selected 20% of items
which he had previously posted and subsequently used those as the test set. And 80% of
items which he had previously posted was used as the training set. To ensure that our
results are not sensitive to the particular training/test portioning for each user, we used a
5-fold cross validation scheme [7]. Therefore, the result values reported in the
experiment section are the averages over all five runs.
To measure the performance of the recommendations, we adopted precision and
recall, which can judge how relevant a set of ranked recommendations is for the user
[7]. Precision measures the ratio of items in a list of recommendations that were also
contained in the test set to number of items recommended. And recall measures the
ratio of in a list of recommendations that were also contained in the test set to the
overall number of items in test set. Precision and recall for a given user u in test set is
given:
|)(|
|)()(|)(
uTopN
uTopNuTestuprecision
,
|)(|
|)()(|)(
uTest
uTopNuTesturecall
(15)
where Test(u) is the item list of user u in the test set and TopN(u) is the top N
recommended item list for user u. Finally, the overall precision and recall for all users
in the test set is computed by averaging the personal precision(u) and recall(u).
However, precision and recall are often in conflict with each other. Generally, the
increment of the number of items recommended tends to decrease precision but it
increases recall [18]. Therefore, to consider both of them as the quality judgment of
recommendations we use the standard F1 metric, which combines precision and recall
into a single number:
Technical Report, March 2010
25
recallprecision
recallprecisionF1
2 (16)
In order to compare the performance of our algorithm, a user-based CF algorithm,
where the similarity is computed by cosine-based similarity (denoted as UCF) [18], an
item-based CF algorithm [5], which employs cosine-based similarity (denoted as ICF),
and a most popular tags approach [9], which are based on tags that a user already used
(denoted as MPT), were implemented. The top N recommendation via the semantic
social ranking of the user-based method (denoted as SUR) and the item-based method
(denoted as SIR) were evaluated in comparison with the benchmark algorithms.
5.2 Experiment Results
In this section, we present detailed experimental results. The performance evaluation is
divided into two dimensions. The influence of the number of neighbors on the
performance is first investigated, and then the quality of item recommendations is
evaluated in comparison with the benchmark methods.
5.2.1 Experiments with Neighborhood Size
As noted in a number of previous studies, the size has significant impact on the
recommendation quality of neighborhood-based algorithms [5, 10, 17, 18]. Therefore,
we varied the neighborhood size k from 5 to 80. In the case of UCF and SUR, the
parameter k denotes the number of the nearest users whereas it is the number of the
most similar items for ICF and SIR. Even though MPT is not affected by the
neighborhood size at all, we also reported its result for the purpose of comparisons. In
this experiment, we set the number of recommended items N to 10 (i.e., top-10) for each
user in the test set.
Fig. 7 shows a graph of how precisions and recalls of four methods changes as the
neighborhood size grows. UCF elevate precision as the neighborhood size increases
from 5 to 20, after this value, it decreased slightly. With respect to recall, the curve of
the graph for UCF tends to be flat. In the case of ICF, precision and recall improved
until a neighbor size of 40; beyond this point, further increases of the size give negative
Technical Report, March 2010
26
influence on performance. With respect to SIR, we observe that precision and recall
tend to improve slightly as k value is increased from 5 to 30 and to 40, respectively.
However, after this point, any further increases leads to worse results. For SUR, results
obtained look different. Unlike UCF, ICF, and SIR, It can be observed from the curve of
the graphs that SUR was quite affected by the size of the neighborhood. The two charts
for SUR reveal that increasing neighborhood size has detrimental effects on both
metrics. In other words, we found that SUR provides better precision and recall when
the neighborhood size is relatively small. For example, in terms of both precision and
recall, increasing neighborhood size much beyond 20 yielded rather worse results than
when the size was 5. This result indicates that superfluous users can negatively impact
the recommendation quality of our method.
Fig. 7 Precision and recall with respect to increasing the neighborhood size
We continued to examine F1 values in order to compare with each other and then to
select the best neighborhood size of each method. Fig. 8 depicts F1 variation of four
methods as the neighborhood size increases. SUR outperforms MPT and ICF at all
neighborhood size levels, whereas it provides improved performance over UCF as the
neighborhood size increases from 5 to 40; beyond this point, F1 was poorer for SUR
than for UCF. Examining the best value of each method, F1 of UCF is 0.107 (k=20), F1
of ICF is 0.099 (k=40), F1 of MPT is 0.0825, F1 of SUR is 0.119 (k=10), and F1 of SIR
is 0.114 (k=30). This result implies SUR can provide better performance than the other
methods even when data is sparse or available data for users is relatively insufficient. In
practice, CF recommender systems make a trade-off between recommendation quality
and real-time performance efficiency by pre-selecting a number of neighbors. In
Technical Report, March 2010
27
consideration of both quality and computation cost, we selected 20, 40, 10, and 30 as
the neighborhood size of UCF, ICF, SUR, and SIR, respectively, in subsequent
experiments.
Fig. 8 F1 value with respect to increasing the neighborhood size
5.2.2 Comparisons with Other Methods
To experimentally evaluate the performance of top N recommendation, we calculated
precision, recall, and F1 obtained by UCF, ICF, MPT, SUR, and SIR according to the
variation of number of recommended items N from 2 to 10 with an increment of 2.
Since in practice users tend to click on items with higher ranks, we only examined a
small number of recommended items.
Fig. 9 depicts the precision-recall plot, showing how precisions and recalls of four
methods changes as N value increases. Data points on the graph curves refer to the
number of recommended items. That is, first point of each curve represents the case of
N=2 (i.e., top-2) whereas last point is the case of N=10 (i.e., top-10). As can be observed
from the graph, the curves of all methods tend to descend. This phenomenon implies
that the increment of the number of items recommended tends to decrease precision but
it increases recall. With respect to precision, in the case that N=2 and N=4, SIR
demonstrates the best performance. However, SUR demonstrates the best performance
Technical Report, March 2010
28
as N is increased. With respect to recall, SUR outperforms the other four methods on all
occasions.
Fig. 9 Precision and recall as the value of the number of recommended items N increases
Fig. 10 Comparisons of F1 values as the number of recommended items N increases
Let us now focus on F1 results. Fig. 10 depicts the results of F1, showing how SUR
and SIR outperforms the other methods. As shown, both of the methods show
considerably improved performance compared to UCF, ICF, and MPT. For example,
SUR achieves 0.1%, 1.6%, and 1.5% improvement in the case of top-2 (N=2) whereas
SIR achieves 0.3%, 1.9% and 1.7% improvement, compared to UCF, ICF, and MPT,
respectively. When comparing the results achieved by SUR and SIR, The
Technical Report, March 2010
29
recommendation quality of the former is superior to that of SIR as N is increased. In
terms of the five cases in average, SUR obtains 0.6%, 1.8%, 2.6%, and 0.3%
improvement compared to UCF, ICF, MPT, and SIR, respectively.
Fig. 11 Comparisons of precision, recall, and F1 for cold start users and active users
We further examined the recommendation performance for users who had few posts,
namely cold start users, and had lots of posts, namely active users, in the training set. A
CF-based recommender system is generally unable to make high quality
recommendations, compared to the case of active users, which is pointed out as one of
the limitations. We selectively considered two subsets of users who have less than 6
posts (21 users) and greater than 25 posts (21 users). And for two groups we calculated
precision, recall, and F1 within top-10 of ranked result set obtained by UCF, ICF,
MPT, and SUR. Fig. 11 shows those results for the cold start users (left) and active users
(right). As we can see from the graphs, the result demonstrated that F1 values of the
cold start group were considerably low compared to those for the active group. Such
results were caused by the fact that it was hard to analyze the users’ propensity for
postings because they did not have enough information (items or tags). Nevertheless,
comparing the results achieved by SUR and the benchmark algorithms, for the cold start
dataset, precision, recall, and F1 values of the former was found to be superior to those
of the other methods. For example, SUR obtains 11.9%, 14.3%, and 2.4% improvement
for recall compared to UCF, ICF, and MPT, respectively. In terms of F1, SUR
outperforms UCF, ICF and MPT by 2.2%, 2.6% and 0.4%, respectively. Only MPT
achieves comparable results. This result indicates that utilizing tagging information can
be helpful to alleviate the problem of the cold start users and thus to improve the quality
Technical Report, March 2010
30
of item recommendations. With respect to the active dataset, it can be observed that
SUR provides better performance on all occasions than ICF and MPT. And comparing
F1 obtained by UCF and SUR, the difference appears insignificant in a comparative
fashion. Although precision of SUR is slightly worse than that of UCF in the active
dataset, notably the proposed method provides better quality than the benchmark
methods. That is, SUR can provide more suitable items not only to the cold start users
but also to the active users. Comparing results in the cold start and active dataset
achieved by MTP, interesting results were observed. Simple approach based on tags for
recommendations works well enough for the cold start users, compared to UCF and
ICF. However, in the other scenario, superfluous tags of users can include noise instead.
We conclude from these comparison experiments that our approaches can provide
consistently better quality of recommendations than the other methods. Furthermore, we
believe that the results of the proposed approach will become more practically
significant on large-scale Web 2.0 frameworks.
6 Concluding Remarks
For the future of the social Web, in this report, we have presented the semantic models
for the interoperability challenges that face semantic technology. We also proposed two
methods of collaborative filtering applied the semantic models and analyzed the
potential benefits of IEML to social recommender systems. As noted in our
experimental results, our methods can successfully enhance the performance of item
recommendations. Moreover, we also observed that our methods can provide more
suitable items for user interests, even when the number of recommended is small. The
main contributions of this study can be summarized as follows: 1) Our methods can
solve traditional stumbling blocks such as polysemy, synonymy, data sparseness, cold
start problem, semantic interoperability. 2) It can also offer trustworthy items
semantically relevant to a user’ needs because it becomes easier not only to catch
his/her preference but also to recommend to him/her by capturing semantics of user-
generated tags.
Technical Report, March 2010
31
Acknowledgment
The work was mainly funded since 2009 by the Canada Research Chair in Collective
Intelligence at University of Ottawa.
References
1. Adomavicius, G., Tuzhilin, A. (2005) Toward the Next Generation of Recommender Systems: A
survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data
Engineering 17(6): 734-749
2. Bao, S., Wu, X., Fei, B., Xue, G., Su, Z., Yu, Y. (2007) Optimizing web search using social
annotations. In: Proceedings of the 16th International Conference on World Wide Web, pp. 501-510
3. Bonhard, P., Sasse, A. (2006) ‘Knowing me, knowing you’ - using profiles and social networking to
improve recommender systems. BT Technology Journal 24(3): 84-98
4. Breese, J. S., Heckerman, D., Kadie, C. (1998) Empirical analysis of predictive algorithms for
collaborative filtering. In: Proceedings of the Fourteenth Annual Conference on Uncertainty in
Artificial Intelligence, pp. 43–52
5. Deshpande, M., Karypis, G. (2004) Item-based Top-N Recommendation Algorithms. ACM
Transactions on Information Systems22(1): 143-177
6. Golder, S. A., Huberman, B. A. (2006) Usage patterns of collaborative tagging systems. Journal of
Information Science 32(2): 198-208
7. Herlocker, J. L., Konstan, J. A., Terveen, L. G., Riedl, J. T. (2004) Evaluating collaborative filtering
recommender systems. ACM Transactions on Information Systems 22(1): 5-53
8. Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. (2006) Information Retrieval in Folksonomies:
Search and ranking. In: Proceedings of the 3rd European Semantic Web Conference, pp. 411-426
9. Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G. (2008) Tag recommendations
in social bookmarking systems. AI Communications 21(4): 231-247
10. Kim, H.-N., Ji, A.-T., Ha, I., Jo, G.-S. (2009) Collaborative filtering based on collaborative tagging
for enhancing the quality of recommendation. Electronic Commerce Research and Applications, Doi:
10.1016/j.elerap.2009.08.004
11. Lévy, P. (2009) Toward a self-referential collective intelligence some philosophical background of
the IEML research program. In: Proceedings of 1st International Conference on Computational
Collective Intelligence - Semantic Web, Social Networks & Multiagent Systems, pp. 22-35
12. Lévy, P. (2010) From social computing to reflexive collective intelligence: The IEML research
program. Information Sciences 180(1): 71-94
13. Li, X., Guo, L., Zhao, Y. (2008) Tag-based social interest discovery. In: Proceedings of the 17th
International Conference on World Wide Web, pp. 675-684
Technical Report, March 2010
32
14. Marchetti, A., Tesconi, M., Ronzano, F. (2007) SemKey: A semantic collaborative tagging system.
In: Proceedings of Tagging and Metadata for Social Information Organization Workshop in the 16th
International Conference on World Wide Web
15. Peis, E., Morales-del-Castillo, J. M., Delgado-López, J. A. (2008) Semantic recommender systems.
Analysis of the state of the topic. Hipertext.net number 6.
http://www.hipertext.net/english/pag1031.htm. Accessed 15 Dec 2009
16. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J. (1994) GroupLens: An open
architecture for collaborative filtering of netnews. In: Proceedings of the ACM 1994 Conference on
Computer Supported Cooperative Work, pp. 175–186
17. Sarwar, B., Karypis, G., Konstan, J., Reidl, J. (2001) Item-based collaborative filtering
recommendation algorithms. In: Proceedings of the Tenth International World Wide Web
Conference, pp. 285-295
18. Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2000) Analysis of recommendation algorithms for E-
commerce. In: Proceedings of ACM Conference on Electronic Commerce, pp. 158–167
19. Schenkel, R., Crecelius, T., Kacimi, M., Michel, S., Neumann, T., Parreira, J. X., Weikum, G. (2008)
Efficient top-k querying over social-tagging networks. In: Proceedings of the 31st Annual
International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.
523-530
20. Siersdorfer, S., Sizov, S. (2009) Social recommender systems for web 2.0 folksonomies. In:
Proceedings of the 20th ACM conference on Hypertext and hypermedia, pp. 261-270
21. Sigurbjörnsson, B., van Zwol, R. (2008) Flickr tag recommendation based on collective knowledge.
In: Proceedings of the 17th International Conference on World Wide Web, pp. 327-336
22. Tso-Sutter, K. H. L., Marinho, L. B., Thieme, L. S (2008) Tag-aware recommender systems by
fusion of collaborative filtering algorithms. In: Proceedings of the 2008 ACM symposium on Applied
computing, pp. 1995-1999
23. Xu, Z., Fu, Y., Mao, J., Su, D. (2006) Towards the Semantic Web: collaborative tag suggestions. In:
Proceedings of the Collaborative Web Tagging Workshop in the 15th International Conference on
the World Wide Web
24. Zanardi, V., Capra, L. (2008) Social Ranking: Uncovering relevant content using tag-based
recommender systems. In: Proceedings of the 2008 ACM conference on Recommender Systems, pp.
51-58
25. Zhang, Z.-K., Zhou, T., Zhang, Y.-C. (2010) Personalized recommendation via integrated diffusion
on user-item-tag tripartite graphs. Physica A: Statistical Mechanics and its Applications 389(1): 179-
186
26. Knowledge and Data Engineering Group (2007) University of Kassel: Benchmark Folksonomy Data
from BibSonomy, version of April 30th, 2007. http://www.kde.cs.uni-kassel.de/bibsonomy/dumps/.
Accessed 15 Dec 2009