Upload
cataldo-musto
View
1.067
Download
1
Embed Size (px)
DESCRIPTION
Linked Open Data-enabled Strategies for Top-N Recommendations - Cataldo Musto, Pierpaolo Basile, Pasquale Lops, Marco De Gemmis and Giovanni Semeraro - 1st Workshop on New Trends in Content-based Recommender Systems, co-located with ACM Recommender Systems 2014
Citation preview
Linked Open Data-enabled Strategies for Top-N Recommendations
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)
CBRecSys 2014 Workshop on New Trends in
Content-based Recommender Systems Foster City (CA, United States)
October 6, 2014
Outline• Background
• Content-based RecSys (CBRS) • Limitations
• Linked Open Data • What? • Introducing LOD in CBRS
• Experiments • Conclusions
2Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Content-based Recommender SystemsSuggest items similar to those the user liked in the past (I bought Converse shoes, I’ll continue buying similar sport shoes)
3Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Content-based Recommender SystemsLimitations
Limited content
4
(in several domains)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Content-based Recommender SystemsLimitations
Poor Semantics
5Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
How can we boost Content-based
Recommender Systems with Semantics?
(and with more content)
6
Problem
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
7
Semantics in CBRSState of the art
XOntologies
Encyclopedic Knowledge Linked Open Data
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Distributional SemanticsFolksonomies
8
Top-down approachesWhat is the difference?
XFormal Semantics Large-scale
Folksonomies X XOntologies V X
Encyclopedic Knowledge X VDistributional Semantics X V
Linked Open Data V V
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
9
Top-down approachesWhat is the difference?
XFormal Semantics Large-scale
Folksonomies X XOntologies V X
Encyclopedic Knowledge X VDistributional Semantics X V
Linked Open Data V V
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Linked Open Data merge the vastness of encyclopedic knowledge with the formal semantics typical of ontologies
10
Top-down approachesWhat is the difference?
XFormal Semantics Large-scale
Folksonomies X XOntologies V X
Encyclopedic Knowledge X VDistributional Semantics X V
Linked Open Data V V
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Linked Open Data merge the vastness of encyclopedic knowledge with the formal semantics typical of ontologies
We focus on the introduction of Linked Open Data in
Content-based Recommender Systems
11
Linked Open Data
What are we talking about?Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
12
Linked Open Data
Methodology to publish, share and link structured data on the Web
Definition
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
13
Linked Open Data (cloud)
A (large) set of interconnected semantic datasets
What is it?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
14
Linked Open Data (cloud)What kind of datasets?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
15
Linked Open Data (cloud)DBpedia
http://dbpedia.org
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
16
Linked Open Data (cloud)
DBpedia is the structured mapping of Wikipedia
http://dbpedia.org
It is the core of the LOD cloud.Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
DBpedia
17
Linked Open Data (cloud)Example: unstructured content from Wikipedia
“Foster City is a town in United States located in California”example
(from Wikipedia page)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
18
Linked Open Data (cloud)How are these data represented?
Semantic Web cake
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Information from the LOD cloud is
represented in RDF
19
Linked Open Data (cloud)How are these data represented?
Foster City United States
California
http://dbpedia.org/resource/Foster_City,_California
http://dbpedia.org/resource/California
http://dbpedia.org/resource/United_States
dbpedia-owl:country
dbpedia-owl:isPartOf
“Foster City is a town in United States located in California”example
(from Wikipedia page)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
20
Linked Open Data (cloud)How are these data represented?
Foster City United States
California
http://dbpedia.org/resource/Foster_City,_California
http://dbpedia.org/resource/California
http://dbpedia.org/resource/United_States
dbpedia-owl:country
dbpedia-owl:isPartOf
“Foster City is a town in United States located in California”example
(from Wikipedia page)
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Data coming from the LOD cloud have a formal semantics represented in RDF
21
Our checklistCan Linked Open Data boost
content-based recommender systems?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
More Semantics More Content
V ?
22
Linked Open Data (cloud)How many data?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
23
Linked Open Data (cloud)How many data?
1048 datasets and 58 billions triplessource: http://stats.lod2.eu
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
24
Our checklist
More Semantics More Content
V V
Can Linked Open Data boost content-based recommender systems?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
25
Our checklist
More Semantics More Content
V V
Can Linked Open Data boost content-based recommender systems?
…but
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
26
Research Question
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
27
ApproachWe propose two methodologies to
introduce LOD-based features into CBRS
Direct Access to DBpedia Entity Linking algorithms
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
28
Introducing LOD-based features in CBRSMethodology :: Direct Access to DBpedia
The simplest way to introduce LOD-based features
Domain-dependent features are manually defined
(e.g. book recommendation —> genre, author, publisher, subject, etc.)
SPARQL queries extract features’ values
1.
2.Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
(We assume that each item to be recommender is already in the LOD cloud)
Example: The Great and Secret Show (Clive Barker’s book)
29
Introducing LOD-based features in CBRSMethodology :: Direct Access to DBpedia
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
30
Introducing LOD-based features in CBRSMethodology :: Direct Access to DBpedia
e.g. Book Recommendation: author, genre, publisher, subjectCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
31
Introducing LOD-based features in CBRSMethodology :: Direct Access to DBpedia
Each item is represented through the set of the (manually defined) features extracted from the LOD cloud.
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
32
Introducing LOD-based features in CBRSMethodology :: Direct Access to DBpedia
9 LOD-based features: author (Clive Barker), genre (Fantasy Literature), publisher (William Collins), series (Books of the Art), subject (1980s fantasy novels, William Collins books,
Novels by Clive Barker, British Fantasy Novels)Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
33
Direct Access to DBpediaAnalysis
- Very Straightforward approach
- SPARQL queries can be easily built
- Properties are manually defined- Approach is strongly domain-dependent- Does not exploit unstructured information
Pros:
Cons:
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Introducing LOD-based features in CBRS
• Entity Linking Algorithms!• Input: free text.
• items description, in our setting • Output: identification of the most
relevant entities mentioned in the text.
• State of the art • tag.me(1), • DBpedia Spotlight(2), • Wikipedia Miner(3)
Methodology :: Entity Linking algorithms
(1) http://tagme.di.unipi.it
(2) http://spotlight.dbpedia.org
(3) http://wikipedia-miner.cms.waikato.ac.nz
34Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Introducing LOD-based features in CBRS
• Entity Linking Algorithms!• Input: free text.
• items description, in our setting • Output: identification of the most
relevant entities mentioned in the text.
• State of the art • tag.me(1), • DBpedia Spotlight(2), • Wikipedia Miner(3)
Methodology :: Entity Linking algorithms
(1) http://tagme.di.unipi.it
(2) http://spotlight.dbpedia.org
(3) http://wikipedia-miner.cms.waikato.ac.nz
35Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
36
• Entity Linking Algorithms!• Input: free text.
• in this setting: textual description of the items (e.g. Wikipedia abstract)
• Output: identification of the most relevant entities mentioned in the text.
Introducing LOD-based features in CBRSMethodology :: Entity Linking algorithms
from Tagme
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Entity Linking - output
37
Introducing LOD-based features in CBRSMethodology :: Entity Linking algorithms
Very human-readable representation!Free n-grams and entity recognition, free sense disambiguation
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Entity Linking - output
Each entity is a reference to a DBpedia node http://dbpedia.org/resource/Harry_D'Amour
not a simple textual feature!
38
Introducing LOD-based features in CBRSMethodology :: Entity Linking algorithms
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
LOD-based representation can be enriched!through broader categories by exploiting SPARQL queries
39
Introducing LOD-based features in CBRS
encoded in the dcterms:subject property
Methodology :: Entity Linking algorithms
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
The final representation of
each item is obtained by merging the
DBpedia nodes identified in the
text with those the dcterms:subjects property refers to
(broader categories)
+dbpedia nodesbroader categories
Features =
40
Introducing LOD-based features in CBRSMethodology :: Entity Linking algorithms
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
41
Entity Linking AlgorithmsAnalysis
Pros:
Cons:
- Very general approach
- Strong features engineering (which ones are the best?)
- Threshold score of Entity Linking algorithms is difficult to be set
- Exploit unstructured information
- May introduce unexpected (but relevant) features
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
42
LOD-based features in CBRS
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Experimental EvaluationResearch Hypothesis
43
1. Which is the contribution of the Linked Open Data features to the accuracy of recommendation algorithms?
2. Does the representation based on Linked Open Data outperform existing state-of-the-art recommendation algorithms?
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Experimental EvaluationDescription of the dataset
44
• Book recommendation • ESWC 2014 Challenge
Dataset (*)
• 6,733 books
• 6,181 users
• 72,372 binary ratings • 11.71 ratings/user • Very sparse dataset! • Only 5.37 positive
ratings/user! (*) http://challenges.2014.eswc-conferences.org/index.php/RecSys
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Experimental EvaluationFeature combinations
45
• Content (crawled from Wikipedia + NLP processing)
• LOD (direct access to DBpedia)
• Entity Linking (Tagme)
• Content + LOD
• Content + Entity Linking
• LOD + Entity Linking
• All
7 combinations for each run
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Experimental EvaluationSetup
46
• Evaluation of the effectiveness of LOD-based features on varying six different recommendation algorithms
• Vector Space Models • VSM • BM25 • eVSM (*)
• Classifiers • Random Forests • Linear Regression
• Graph-based Approaches • PageRank with Priors
(*) C. Musto: Enhanced vector space models for content-based recommender
systems. RecSys 2010: 361-364
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Experimental EvaluationDesign of the Experiment :: Vector Space Models
47
User profile (built upon the features describing the items the
user liked) used as query
Cosine Similarity to get the most similar items
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Experimental EvaluationDesign of the Experiment :: Classifiers
48
Random Forests learn a classification model which is used to predict the class (positive/negative)
of unlabeled item.!! Model is based on the features
coming from labeled items.
Linear Regression also uses “basic” features (e.g. positive and
negative ratings, average rating of the user, ratio between positive and
negative ratings, etc.) to learn the model.
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Experimental EvaluationDesign of the Experiment :: PageRank with Priors (PRP)
49Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
users, items = nodes positive feedback = edgesPageRank calculates the ‘importance’ of a node according to the
quality and the number of its connectionsEqual probability is assigned to all the nodes, by default
graph-based representation
Experimental EvaluationDesign of the Experiment :: PageRank with Priors (PRP)
50Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
users, items = nodes positive feedback = edges
PageRank with Priors introduces a bias towards some nodes !(in our setting, the items the user liked)
PageRank calculates the ‘importance’ of a node according to the quality and the number of its connections
graph-based representation
Experimental EvaluationDesign of the Experiment :: PageRank with Priors (PRP)
51
Several strategies to build the graph are compared
1. no-LOD. Graph only models users and items
2. small-LOD. Graph expanded with new nodes
by adding basic properties (subject,
genre, publisher, author, etc.), of the items as well
as their relationships
3. big-LOD. Graph is further expanded by
introducing more nodes (e.g. other resources of the same
genre, other resources written by the authors, etc.),
as well as their relationships
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Rationale: the introduction of new nodes and connections coming from the LOD cloud can
improve the effectiveness of the PageRank.
Experimental EvaluationDesign of the Experiment :: PageRank with Priors (PRP)
52
Several strategies to build the graph are compared
1. no-LOD. Graph only models users and items
2. small-LOD. Graph expanded with new nodes
by adding basic properties (subject,
genre, publisher, author, etc.), of the items as well
as their relationships
3. big-LOD. Graph is further expanded by
introducing more nodes (e.g. other resources of the same
genre, other resources written by the authors, etc.),
as well as their relationships
PRP is run and items in the test set are ranked according to their PageRank
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Experimental EvaluationRecap
53Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
6 algorithms 7 set of features• Content
• LOD
• Entity Linking
• Content + LOD
• Content + Entity Linking
• LOD + Entity Linking
• All
• VSM
• BM25
• eVSM
• Linear Regression
• Random Forests
• Page Rank With Priors
Experiment 1
54
Impact of LOD-based features.
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
53 53,5 54 54,5 55
54,36
54,69
54,47
54,59
54,62
53,79
54,42
Experiment 1
55
Impact of LOD-based features :: VECTOR SPACE MODEL
LOD-based features improve F1-measureCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,17+0,05
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
53 53,5 54 54,5 55
54,36
54,69
54,47
54,59
54,62
53,79
54,42
Experiment 1
56
Impact of LOD-based features :: VECTOR SPACE MODEL
Statistically significant improvementCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,17+0,05
paired t-test (p<0.01)
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
53 53,5 54 54,5 55
54,36
54,69
54,47
54,59
54,62
53,79
54,42
Experiment 1
57
Impact of LOD-based features :: VECTOR SPACE MODEL
Best: LOD+Entity Linking (No Content!)Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,27
paired t-test (p<0.01)
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
53 53,5 54 54,5 55
54,6
53,91
54,51
54,56
53,9
53,43
54,43
Experiment 1
58
Impact of LOD-based features :: BM25
Worst (again): LOD aloneCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
-1,00%
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
53 53,5 54 54,5 55
54,6
53,91
54,51
54,56
53,9
53,43
54,43
Experiment 1
59
Impact of LOD-based features :: BM25
Best (again): LOD+Entity Linking (With Content!)Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,17
paired t-test (p<0.01)
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
51 51,75 52,5 53,25 54
53,02
53,04
53,07
52,8
53,37
52,06
52,9
Experiment 1
60
Impact of LOD-based features :: EVSM
Introduction of LOD-based features leads to an improvement againCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,17+0,14
paired t-test (p<0.01)
+0,12
+0,47
Experiment 1
61
Impact of LOD-based features :: LESSONS LEARNED FOR VSMS
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
VSM BM25 eVSM
LOD features alone are always the worst configuration.
(At least) a LOD-based representation based on Entity Linking always
improve the content alone
1.2.
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
53 53,25 53,5 53,75 54
53,86
53,77
53,76
53,75
53,68
53,34
53,52
Experiment 1
62
Impact of LOD-based features :: RANDOM FORESTS
Similar outcomes: all but LOD alone lead to improvementCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,36
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
53 53,25 53,5 53,75 54
53,86
53,77
53,76
53,75
53,68
53,34
53,52
Experiment 1
63
Impact of LOD-based features :: RANDOM FORESTS
Content does matter: LOD+entity+content is the bestCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,36
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
55 55,25 55,5 55,75 56
55,57
55,61
55,64
55,5
55,67
55,59
55,59
Experiment 1
64
Impact of LOD-based features :: LINEAR REGRESSION
Entity-based representation is the best oneCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,08paired t-test (p<0.01)
CONTENT
LOD
ENTITY
CONTENT+LOD
CONTENT+ENTITY
LOD+ENTITY
ALL
55 55,25 55,5 55,75 56
55,57
55,61
55,64
55,5
55,67
55,59
55,59
Experiment 1
65
Impact of LOD-based features :: LINEAR REGRESSION
BTW, smaller improvements (due to basic features?)Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,08paired t-test (p<0.01)
Experiment 1
66
Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
LRRF
LOD features alone never overcome the content
(At least) a LOD-based representation based on Entity Linking always
improve the content alone
1.2.
Experiment 1
67
Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
LR RF
LOD features alone never overcome the content
(At least) a LOD-based representation based on Entity Linking always
improve the content alone
1.2.
Same outcomes (algorithm-independent behaviour)
Experiment 1
68
Impact of LOD-based features :: LESSONS LEARNED FOR CLASSIFIERS
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
LR RF
LOD features alone never overcome the content
(At least) a LOD-based representation based on Entity Linking always
improve the content alone
1.2.
Same outcomes (algorithm-independent behaviour)
Experiment 1
69
Impact of LOD-based features :: PAGERANK WITH PRIORS
The more LOD-based data, the best the accuracyCataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
NO-LOD
SMALL-LOD
BIG-LOD
53 54 55 56 57
55,44
54,73
54,28
+0,45
+1,16
paired t-test (p<0.001)
NO-LOD
SMALL-LOD
BIG-LOD
53 54 55 56 57
55,44
54,73
54,28
Experiment 1
70
Impact of LOD-based features :: PAGERANK WITH PRIORS
Drawback: more nodes produce an exponential growth of computational costs (from 3 hours to 120 hours to run the experiment!)Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+0,45
+1,16
paired t-test (p<0.001)
Experiment 2
71
Comparison to State of the art
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
SPRANK (Semantic Path Ranking)[*] BPRMF (Bayesian Personalized Ranking) [+]
U2U_CF (User to User CF) I2I_CF (Item to Item CF)
[+] S. Rendle, C.Freudenthaler, Z. Gantner, L. Schmidt-Thieme: BPR: Bayesian Personalized Ranking from Implicit Feedback. UAI 2009.
[*] V. Ostuni, T. Di Noia, E. Di Sciascio, R. Mirizzi: Top-N recommendations from implicit feedback leveraging Linked Open Data. RECSYS 2013
VSM
LR
PRP
SPRANK
BPRMF
U2U_CF
I2I_CF
51 52,25 53,5 54,75 56
52,24
52,28
54,12
52,27
55,44
55,67
54,69
Experiment 2
72
Comparison to state of the art
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Our best-performing configurations are considered as baseline
baselines
VSM
LR
PRP
SPRANK
BPRMF
U2U_CF
I2I_CF
51 52,25 53,5 54,75 56
52,24
52,28
54,12
52,27
55,44
55,67
54,69
Experiment 2
73
Comparison to state of the art
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Classical CF techniques poorly performs (sparsity?)
VSM
LR
PRP
SPRANK
BPRMF
U2U_CF
I2I_CF
51 52,25 53,5 54,75 56
52,24
52,28
54,12
52,27
55,44
55,67
54,69
Experiment 2
74
Comparison to state of the art
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
+3,4% over LOD-based state of the art algorithm
!-3,4%
VSM
LR
PRP
SPRANK
BPRMF
U2U_CF
I2I_CF
51 52,25 53,5 54,75 56
52,24
52,28
54,12
52,27
55,44
55,67
54,69
Experiment 2
75
Comparison to state of the art
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Our approaches overcome Matrix Factorization
+0,57
+0,32
+1,55
Conclusions
76Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Lessons Learned
77
Two Solutions have been proposed.!Direct Access to DBpedia and Entity Linking Algorithms!!
Evaluation.!Research Question: What is the impact of LOD-based features on VSM, Classifiers and Graph-based Algorithms?!All recommendation approaches significantly benefit of the introduction of LOD-based features!Our best-performing configurations overcomes both collaborative and LOD-based state of the art algorithms
INVESTIGATION ABOUT THE EFFECTIVENESS OF LINKED OPEN DATA IN CONTENT-BASED RECOMMENDATION TASKS
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014
Future Research
78
Evaluation against different datasets and stronger baselines;
Better (automatic) tuning of parameters and integration of more LOD-based datasources
Evaluation of Novelty, Diversity and Serendipity on LOD-based Recommendations;
Cataldo Musto, Pierpaolo Basile, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis. Linked Open Data-enabled Strategies for Top-N Recommendation. CBRecSys 2014 Workshop, Silicon Valley (US), 6.10.2014