View
565
Download
2
Category
Preview:
Citation preview
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
LINKED OPEN DATA TO SUPPORT CONTENT-BASED RECOMMENDER SYSTEMS
Tommaso Di Noia1, Roberto Mirizzi2, Vito Claudio Ostuni1, Davide Romito1, Markus Zanker3
t.dinoia@poliba.it, roberto.mirizzi@hp.com, ostuni@deemail.poliba.it, romito@deemail.poliba.it, markus.zanker@uni-klu.ac.at
2HP Labs 1501 Page Mill Road Palo Alto, CA (US) 94304
3Alpen-Adria-Universität Klagenfurt Universitätsstraße 65 -67 9020 Klagenfurt, Austria
1Politecnico di Bari Via Orabona, 4 70125 Bari (ITALY)
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Outline
What are (Content-based) Recommender Systems? The main drawback: limited content analysis
Vector Space Model for Linked Open Data (LOD) Vector Space Model adapted to RDF graphs
A Semantic Content-based Recommender System A Memory-based algorithm which uses a LOD-based item similarity measure
Evaluation Precision and Recall experiments with MovieLens
Conclusion
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Recommender Systems
Input Data: A set of users U={u1, …, uN} A set of items I={i1, …, iM} The rating matrix R=[ru,i]
Problem Definition:
Given user u and target item i Predict the rating ru,i
A definition Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. [F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Content-based Recommender Systems
CB-RSs recommend items to a user based on their description and on the profile of the user’s interests *
Recommender System
User profile
Items Item1 Item2
Item100 Item’s
descriptions
Item7 Item15 Item11 …
Top-N Recommendations
Item1, 5 Item2, 1 Item5, 4 Item10, 5 ….
(*) Pazzani, M. J., & Billsus, D. Content-Based Recommendation Systems. The Adaptive Web. Lecture Notes in Computer Science vol. 4321, 325-341, 2007
….
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Main CB RS Drawback: Limited Content Analysis
Need of domain knowledge! We need rich descriptions of the items!
No suggestion is available if the analyzed content does not contain enough information to discriminate items the user might like from items the user might not like.*
(*) P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach and B. Shapira, editors, Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners
The quality of CB recommendations are correlated with the quality of the features that are explicitly associated with the items.
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
A Linked Data based Solution
Use Linked Data to mitigate the limited content analysis issue
Plenty of structured data available No Content Analyzer required
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
LINKED DATA as structured information source for item’s descriptions
Rich items descriptions
Let’s use all this ontological knowledge to build smarter CB RSs
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Computing similarity in LOD datasets
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Vector Space Model for LOD (i)
[http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]
Quick recap on Vector Space Model Vector Space Model is an algebraic model for representing both text documents and queries as vectors of index terms wt,d that are positive and non-binary.
1, 2, ,, ,...,T
d d d N dv w w w
, ,t d t d tw tf idf
,
,
,
t d
t d
k dk
ntf
n
, ,1
2 2
, ,1 1
( , )
N
i j i qj q ij
N Nj i j i qi i
w wd dsim d q
d q w w
' 'logt
Didf
d D t d
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Vector Space Model for LOD(ii)
Righteous Kill
Robert De Niro
John Avnet Serial killer films
Drama starring
director subject/broader
genre
Heat
Al Pacino Brian Dennehy
Heist films
Crime films
Rig
hte
ou
s K
ill
Ro
be
rt D
e N
iro
Joh
n A
vnet
Seri
al k
ille
r fi
lms
Dra
ma
He
at
Al P
acin
o
Bri
an D
en
ne
hy
He
ist
film
s C
rim
e f
ilms
starring
Ro
be
rt D
e N
iro
A
l Pac
ino
B
rian
De
nn
eh
y
Righteous Kill Heat
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Vector Space Model for LOD(iii)
Righteous Kill
STARRING Al Pacino
(a1)
Robert De Niro
(a2)
Brian Dennehy
(a3)
Righteous Kill (m1)
Heat (m2)
Heat
xyxyx actormovieactormovieactor idftfw ,,
Righteous Kill (m1) wa1,m1 wa2,m1 wa3,m1
Heat (m2) wa1,m2 wa2,m2 0
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Vector Space Model for LOD(iv)
1 2( , )starring starringsim m m
1 2( , )director directorsim m m
1 2( , )subject subjectsim m m
+
+
1 2( , )sim m m
+ … =
1 1 1 2 2 1 2 2 3 1 3 2
1 1 2 1 3 1 1 2 2 2 3 2
, , , , , ,
1 22 2 2 2 2 2
, , , , , ,
( , )a m a m a m a m a m a m
starring
a m a m a m a m a m a m
w w w w w wsim m m
w w w w w w
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Semantic Content-based Recommender
Given a user profile, defined as:
( ) , =1 if u likes , =-1 otherwise j j j j jprofile u m v v m v
We predict the rating using a Nearest Neighbor Classifier wherein the similarity measure is a linear combination of local property similarities
( )
( , )
( , )( )
j
p p j i
p
j
m profile u
i
sim m m
vP
r u mprofile u
If this similarity is greater or equal to 0, we suggest the movie m to the user u.
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Training the system(i)
In order to identify the best possible values for the coefficients p (i.e., the weights associated to the properties), we train the system via a genetic algorithm.
Fitness function: Minimize the number of misclassification errors ei on the training data (user profile)
User profile
Item1, 1 Item2, -1 Item5, 1 ….
training data user u
| ( )|
i
profile u
Min e
(p1 p2 p3 ….)
optimal values
Optimization
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Training the system(ii)
In some cases (e.g. new user problem) the user could have not rated any item yet. The user-profile is empty. We cannot learn the αp coefficients!
Look at Amazon.com Use Amazon’s collaborative results to capture movie similarities We collected a set of 1000 movies from Amazon. For each one of these movies we look at the correspondent recommendation list.
First suggestion
Righteous Kill Heat
Increment the weights αp associated to the common properties between the two movies. e.g. They have same actors in common and no directors. Hence we can increase the weight of the property starring.
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Experiment settings(i)
MovieLens 1M dataset One-One mapping between MovieLens and DBpedia Using SPARQL queries and Levensthein Distance 3,654 matched movies on 3,952
( ) , =1 if r(u,m ) r , =-1 otherwise j j j j u jprofile u m v v v
Binarization of the 1-5 rating scale
@@
Rec N TestSetP N
N
@@
Rec N TestSetR N
TestSet
1,2...20N
Evaluation goal : Top-N recommendations Metrics: Precision@n + Recall@n
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Experiment settings(ii)
53,840 actors, 18,149 directors, 29,352 distinct writers and 27,035 categories from DBpedia 667 genres from Freebase 26 genres from LinkedMDB
Extracted Graph
dcterms:subject + skos:broader + DBpedia Ontology + Freebase + LinkedMDB genres
Properties
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Alpha-coefficients evaluation
The α-coefficents obtained with the genetic algorithm give us the best performance.
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Property subset evaluation
The subject+broader solution is better than only subject or subject+more broaders.
The best solution is achieved with subject+broader+ genres.
Too many broaders introduce noise.
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Evaluation against other approaches
Our solution outperforms a Linked Data approach (LDSD) and others content-based which do not leverage LOD.
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Conclusion & Future directions
The huge amount of data available on Linked Data datasets can be successfully exploited to overcome limited content analysis.
We have presented a semantic version of the classical vector space model to compute item similarities.
Evaluation against historical datasets and high values of precision and recall prove the validity of our approach.
We are currently working on: Testing the approach with different domains
Improving the recommendation with a hybrid approach (content-based and collaborative filtering)
I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
Q & A
We acknowledge partial support of HP IRP 2011. Grant CW267313.
Recommended