Upload
tommaso-di-noia
View
605
Download
1
Embed Size (px)
Citation preview
Recommender Systemsmeet Linked OpenDataTommasoDiNoia
16thInternationalConferenceonWebEngineering June 7th,2016
[email protected]@TommasoDiNoia
Agenda
• Linked OpenData• What is aRecommender Systemandhowdoes it work?
• Evaluating aRecommender System• Recommender SystemsandLinked OpenData
LINKED OPENDATAAquick introduction to
LinkingOpenDatacloud diagram2014,byMaxSchmachtenberg,ChristianBizer,Anja Jentzsch andRichardCyganiak.http://lod-cloud.net/
Linked(Open)Data
Somedefinitions:
– Amethod ofpublishing dataontheWeb
– (Aninstance of)theWebofData
– Ahuge databasedistributed intheWeb
– LinkedDataistheSemanticWebdoneright
WebvsLinkedData
Web Linked Data
Analogy File System Database
Designed for Men Machines (Software Agents)
Main elements Documents Things
Links between Documents Things
Semantics Implicit Explicit
Courtesy ofProf.EnricoMotta,TheOpenUniversity,MiltonKeynes– Uk – Semantic Web:TechnologiesandApplications.
LODistheWeb
Which technologies?
Which technologies?
DataLanguage
QueryLanguage
SchemaLanguages
URI
• Every resource/entity/thing/relationisidentified bya(unique)URI
– URI:<http://dbpedia.org/resource/Lugano>– CURIE:dbr:Lugano
– URI:<http://purl.org/dc/terms/subject>– CURIE:dct:subject
Which vocabularies/ontologies?
• Most popular onhttp://prefix.cc (June 6,2016)– YAGO:http://yago-knowledge.org/resource/– FOAF:http://xmlns.com/foaf/0.1/– DBpedia Ontology:http://dbpedia.org/ontology/– DBpedia Properties:http://dbpedia.org/property/
– Dublin Core:http://dublincore.org/
Which vocabularies/ontologies?
• Most popular onhttp://lov.okfn.org (June 6,2016)– VANN:http://purl.org/vocab/vann/– SKOS:http://www.w3.org/2004/02/skos/core– FOAF– DCTERMS– DCE:http://purl.org/dc/elements/1.1/
RDF– ResourceDescription Framework
• Basicelement:triple[subject][predicate][object]
URI URIURI|Literal
"string"@lang|"string"^^datatype
RDF– ResourceDescription Framework
dbr:Lugano dbo:country dbr:Switzerland .
dbr:Lugano rdfs:label "Lugano"@en .dbr:Lugano rdfs:label "Lugano"@it .
dbr:Lugano dbo:populationTotal "67201"^^xsd:integer .
dbr:Lugano dct:subject dbc:Cities_in_Switzerland .
dbr:Lugano rdf:type yago:PopulatedPlacesOnLakeLugano.
dbr:Switzerland dbo:leaderParty dbr:Ticino_League .dbr:Switzerland dbp:neighboringMunicipalities dbr:Melide,_Switzerland .
RDF– ResourceDescription Framework
Switzerland Lugano
Melide,_Switzerland
Ticino_League
Cities_in_Switzerland
PopulatedPlacesOnLakeLugano
"Lugano"@en
"Lugano"@it
"67201"^^xsd:integer
country
leaderParty
neighboringMunicipalities
type
subject
label
populationTotal
RDFSandOWLintwo statements
dbo:country rdfs:range dbo:Country .
dbr:Lugano owl:sameAs wikidata:Lugano .
SPARQLPREFIXdbo:<http://dbpedia.org/ontology/>PREFIXrdfs:<http://www.w3.org/2000/01/rdf-schema#>PREFIXdct:<http://purl.org/dc/terms/>PREFIXdbc:<http://dbpedia.org/resource/Category:>
SELECTDISTINCT?city?nameWHERE{?citydct:subject dbc:Cities_in_Switzerland.?cityrdfs:label ?name .?citydbo:populationTotal ?population .FILTER(?population <70000).FILTER(lang(?name)='en')}
SPARQL
curl -g-H'Accept:application/json''http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=PREFIX+dbo%3A%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdfschema%23%3E+PREFIX+dct%3A%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E+PREFIX+dbc%3A%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3A%3E+SELECT+DISTINCT+%3Fcity+%3Fname+WHERE%7B%3Fcity+dct%3Asubject+dbc%3ACities_in_Switzerland.%3Fcity+rdfs%3Alabel+%3Fname.%3Fcity+dbo%3ApopulationTotal+%3Fpopulation.FILTER%28%3Fpopulation+%3C+70000%29.FILTER+%28lang%28%3Fname%29%3D%27en%27%29%7D'
RECOMMENDER SYSTEMSTheinformationoverload problem
60seconds intheWeb
Personalized InformationAccess
• Helptheuser infinding theinformationtheymight beinterested in
• Consider their preferences/pastbehaviour• Filter irrelevant information
Recommender Systems• HelpusersindealingwithInformation/ChoiceOverload• Helptomatchuserswithitems
Somedefinitions
– Initsmostcommonformulation,therecommendationproblemisreducedtotheproblemofestimatingratingsfortheitemsthathavenotbeenseenbyauser.
[G.Adomavicius andA.Tuzhilin.Toward theNextGenerationofRecommenderSystems:AsurveyoftheState-of-the-ArtandPossibleExtension.TKDE,2005.]
– RecommenderSystems(RSs)aresoftwaretoolsandtechniquesprovidingsuggestionsforitemstobeofusetoauser.
[F.Ricci,L.Rokach,B.Shapira,andP.B.Kantor,editors.RecommenderSystemsHandbook. Springer,2015.]
Theproblem
• Estimateautilityfunctiontoautomaticallypredicthowmuchauserwilllikeanitemwhichisunknowntothem.
InputSetofusers
Setofitems
Utilityfunction
𝑈 = {𝑢%,… , 𝑢(}
𝑋 = {𝑥%,… ,𝑥,}
𝑓: 𝑈×𝑋 → 𝑅
∀𝑢 ∈ 𝑈,𝑥56 = arg𝑚𝑎𝑥<∈=𝑓(𝑢, 𝑥)
Output
Theratingmatrix
5 1 2 4 3 ?2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
TheMatrix
Titanic
Iloveshopping
Argo
LoveActually
Thehangover
Tommaso
Francesco
Vittoria
Jessica
Paolo
Theratingmatrix(inthereal world)
5 ? ? 4 3 ?2 4 5 ? 5 ?
? 3 ? 4 ? 3
3 5 ? 5 2 ?
4 ? 5 ? 5 2
TheMatrix
Titanic
Iloveshopping
Argo
LoveActually
Thehangover
Tommaso
Francesco
Vittoria
Jessica
Paolo
Howsparseis aratingmatrix?
𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 = 1 −|𝑅|
𝑋 ⋅ 𝑈
RatingsExplicit
Implicit
RatingPrediction vsRanking
Best Worst
Recommendation techniques
• Content-based• Collaborativefiltering• Demographic• Knowledge-based• Community-based• Hybrid recommender systems
CollaborativeRSCollaborativeRSsrecommenditemstoauserbyidentifyingotheruserswithasimilarprofile
RecommenderSystem
Userprofile
Users
Item7Item15Item11…
Top-NRecommendationsItem1,5Item2,1Item5,4Item10,5….
….
Item1,4Item2,2Item5,5Item10,3….
Item1,4Item2,2Item5,5Item10,3….
Item1,4Item2,2Item5,5Item10,3….
Content-basedRS
RecommenderSystem
Userprofile
Item7Item15Item11…
Top-NRecommendationsItem1,5Item2,1Item5,4Item10,5….
ItemsItem1
Item2
Item100Item’s
descriptions
….
CB-RSsrecommenditemstoauserbasedontheirdescriptionandontheprofileoftheuser’sinterests
Knowledge-basedRS
RecommenderSystem
Item7Item15Item11…
Top-NRecommendations
ItemsItem1
Item2
Item100Item’sdescriptions
….
KB-RSsrecommenditemstoauserbasedontheirdescriptionanddomainknowledgeencodedinaknowledgebase
Knowledge-base
CollaborativeFiltering
• Memory-based– Mainly based onk-NN– Does not require any preliminary modelbuildingphase
• Model-based– Learn apredictivemodelbefore computingrecommendations
User-based CollaborativeRecommendation
5 1 2 4 3 ?2 4 5 3 5 24 3 2 4 1 33 5 1 5 2 44 4 5 3 5 2
TheMatrix
Titanic
Iloveshopping
Argo
LoveActually
Thehangover
Tommaso
Francesco
Vittoria
Jessica
Paolo
𝑠𝑖𝑚 𝑢J, 𝑢K = ∑ 𝑟5M,< −𝑟5M ∗ 𝑟K,< − 𝑟5O�<∈=
∑ 𝑟5M,< −𝑟5MQ�
<∈=�
∗ ∑ 𝑟5O,< −𝑟5OQ
�<∈=
�
Pearson’s correlation coefficient
Rateprediction
�̃� 𝑢J , 𝑥6 = 𝑟5M +∑ 𝑠𝑖𝑚 𝑢J, 𝑢K ∗ 𝑟5O,<T −𝑟5O�5O∈,
∑ 𝑠𝑖𝑚(𝑢J, 𝑢K)�5O∈,
= 𝑋
k-Nearest Neighbors
k =5N
Aneighborhood of20to50neighbors is areasonable choice[Herlocker etal.Anempirical analysis ofdesignchoices inneighborhood-based collaborativefiltering algorithms,InformationRetrieval 5(2002),no.4,287–310.]
Item-based CollaborativeRecommendation
5 1 2 4 3 ?2 4 5 3 5 24 3 2 4 1 33 5 1 5 2 44 4 5 3 5 2
TheMatrix
Titanic
Iloveshopping
Argo
LoveActually
Thehangover
𝑠𝑖𝑚 𝑥J, 𝑥K = 𝑥J ⋅ 𝑥K
|𝑥J| ∗ |𝑥K|=
∑ 𝑟5,<M ∗ 𝑟5,<O�5
∑ 𝑟5,<MQ�
5� ∗ ∑ 𝑟5,<Q�
5�
CosineSimilarity
Rateprediction
�̃� 𝑢J, 𝑥6 = ∑ 𝑠𝑖𝑚 𝑥,𝑥′ ∗ 𝑟<,5M�<∈=WM
∑ 𝑠𝑖𝑚 𝑥,𝑥′�<∈=WM
𝑠𝑖𝑚 𝑥J, 𝑥K = ∑ 𝑟5,<M − 𝑟5X ∗ 𝑟5,<O −𝑟5X�5
∑ 𝑟5,<M −𝑟5XQ�
5�
∗ ∑ 𝑟5,<O − 𝑟5XQ
�5
�
Adjusted CosineSimilarity
= 𝑋5M
Tommaso
Francesco
Vittoria
Jessica
Paolo
CFdrawbacks
• Sparsity /Cold-start– Newuser– Newitem
• Greysheep problem
Content-Based RS
• Items aredescribed interms ofattributes/features
• Afinitesetofvalues is associated toeachfeature
• Itemrepresentation is a(Boolean)vector
Content-based
CB-RSstrytorecommenditemssimilar*tothoseagivenuserhaslikedinthepast
[M.deGemmis etal.RecommenderSystemsHandbook.Springer. 2015]
• Heuristic-based– Usually adopt techniques borrowed fromIR
• Model-based– Often we have amodelforeach user
(*)similarfromacontent-basedperspective
CBdrawbacks
• Contentoverspecialization• Portfolioeffect• Sparsity /Cold-start– Newuser
Knowledge-based RS
• Conversational approaches• Reasoning techniques– Case-based reasoning– Constraint reasoning
Hybrid recommender systems
[RobinD.Burke.Hybrid recommender systems:Survey andexperiments.UserModel.User-Adapt.Interact.,12(4):331–370,2002.]
WeightedThescores (orvotes)ofseveral recommendationtechniques arecombined together toproduceasinglerecommendation.
SwitchingThesystem switches between recommendationtechniques depending onthecurrent situation.
MixedRecommendations fromseveral differentrecommenders arepresented at thesame time
Feature combinationFeatures fromdifferent recommendation datasourcesarethrown together into asinglerecommendationalgorithm.
Cascade One recommender refines therecommendationsgiven byanother.
Feature augmentation Outputfromone technique is used as aninputfeaturetoanother.
Meta-levelThemodellearned byone recommender is used asinputtoanother.
EVALUATION
Dataset split
20%80%
…
hold-out
k-fold cross-validation
TrainingSet
TestSet(TS)
Protocols
• Rated test-items
• All unrated items:computeascoreforeveryitemnot rated bytheuser (also items notappearing intheuser testset)
Accuracy metrics forratingprediction
𝑀𝑒𝑎𝑛𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒𝐸𝑟𝑟𝑜𝑟
𝑀𝐴𝐸 =1|𝑇𝑆| c d |�̃�5,<M − 𝑟5,<M|
�
5,<M ∈ef
𝑅𝑜𝑜𝑡𝑀𝑒𝑎𝑛𝑆𝑞𝑢𝑎𝑟𝑒𝑑𝐸𝑟𝑟𝑜𝑟
𝑅𝑀𝑆𝐸 =1|𝑇𝑆| c d (�̃�5,<M −𝑟5,<M)Q
�
5,<M ∈ef
�
MAEandRMSEdrawback
• Not very suitable fortop-Nrecommendation– Errors inthehighest partoftherecommendationlistareconsidered inthesame wayas theones inthelowest part
Accuracy metrics fortop-Nrecommendation
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑁
𝑃5@𝑁 =|𝐿5 𝑁 ∩ 𝑇𝑆5o|
𝑁
𝑅𝑒𝑐𝑎𝑙𝑙@𝑁
𝑅5@𝑁 =|𝐿5 𝑁 ∩ 𝑇𝑆5o|
|𝑇𝑆5o|
𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝐶𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒𝐺𝑎𝑖𝑛@𝑁
𝑛𝐷𝐶𝐺5@𝑁 =1
𝐼𝐷𝐶𝐺@𝑁d
2wW,x − 1logQ(1 + 𝑘)
,
|}%
𝐿5 𝑁 is therecommendation listuptotheN-th element
𝑇𝑆5o is thesetofrelevant testitems for𝑢
𝐼𝐷𝐶𝐺@𝑁 indicates thescoreObtained byanideal rankingof𝐿5 𝑁
Is all about precision?
Is all about precision?
• Novelty– Recommend items inthelongtail
• Diversity– Avoid torecommend only items inasmallsubsetofthecatalog
– Suggest diverseitems intherecommendation list• Serendipity– Suggest unexpected but interesting items
Novelty
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 − 𝐵𝑎𝑠𝑒𝑑𝑁𝑜𝑣𝑒𝑙𝑡𝑦
𝐸𝐵𝑁5@𝑁 =− d 𝑝J ⋅ logQ𝑝J
�
<∈�W(,)
𝑝J = | 𝑢 ∈ 𝑈 𝑥𝑖𝑠𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡𝑡𝑜𝑢}|
|𝑈|
Diversity𝐼𝑛𝑡𝑟𝑎 − 𝐿𝑖𝑠𝑡𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦
𝐼𝐿𝐷5@𝑁 =12 ⋅
d d 1− 𝑠𝑖𝑚 𝑥J, 𝑥K
�
<O∈�W ,
�
<M∈�W �
𝐼𝐿𝐷@𝑁 =1|𝑈| ⋅
d 𝐼𝐿𝐷5@𝑁�
5∈�
𝐴𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦
𝐴𝐷𝑖𝑛@𝑁 =| ⋃ 𝐿5(𝑁)�
5∈� ||𝑋|
RECOMMENDER SYSTEMS ANDLINKED OPENDATA
Content-Based Recommender Systems
P.Lops,M.deGemmis,G.Semeraro.Content-based recommender Systems:StateoftheArtandTrends.In:P.Kantor,F.Ricci,L.Rokach,B.Shapira,editors,Recommender SystemsHankbook:AcompleteGuideforResearch Scientists&Practitioners
Content-Based Recommender Systems
P.Lops,M.deGemmis,G.Semeraro.Content-based recommender Systems:StateoftheArtandTrends.In:P.Kantor,F.Ricci,L.Rokach,B.Shapira,editors,Recommender SystemsHankbook:AcompleteGuideforResearch Scientists&Practitioners
Needofdomainknowledge!Weneedrichdescriptionsoftheitems!
Nosuggestionisavailableiftheanalyzedcontentdoesnotcontainenoughinformationtodiscriminateitemstheusermightlikefromitemstheusermightnotlike.*
(*)M.deGemmis etal.RecommenderSystemsHandbook.Springer. 2015
ThequalityofCBrecommendationsarecorrelatedwiththequalityofthefeaturesthatareexplicitlyassociatedwiththeitems.
LimitedContentAnalysis
Traditional Content-based RSs
• Baseonkeyword/attribute-baseditemrepresentations
• Relyonthequalityofthecontent-analyzertoextractexpressiveitemfeatures
• Lackofknowledgeabouttheitems
Semantics-aware approaches
TraditionalOntological/SemanticRecommenderSystemsmakeuseoflimiteddomainontologies;
WhataboutLinkedData?
UseLinkedDatatomitigatethelimitedcontentanalysisissue
• Plentyofstructureddataavailable
• NoContentAnalyzerrequired
LinkingOpenDatacloud diagram2014,byMaxSchmachtenberg,ChristianBizer,Anja Jentzsch andRichardCyganiak.http://lod-cloud.net/
Why RS+LOD
• Multi-Domainknowledge
Why RS+LOD
• Standardized (distributed)access todataPREFIXdbpedia:<http://dbpedia.org/resource/>PREFIXdbo:<http://dbpedia.org/ontology/>SELECT?actor WHERE{dbpedia:Pulp_Fiction dbo:starring ?actor .
}
PREFIXyago:<http://yago-knowledge.org/resource/>PREFIXowl:<http://www.w3.org/2002/07/owl#>PREFIXrdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIXdbpedia-owl: <http://dbpedia.org/ontology/>CONSTRUCT{?book?p?o.?bookyago:linksTo ?yagolink .
}WHERE{SERVICE<http://live.dbpedia.org/sparql>{?book rdf:type dbpedia-owl:Book .?book ?p?o.?bookowl:sameAs ?yago .FILTER(regex(str(?yago),"http://yago-knowledge.org/resource/"))
.}SERVICE<http://lod2.openlinksw.com/sparql>{?yago yago:linksTo ?yagolink .
}}
Why RS+LOD
• Semantic Analysis
Ahighlevel architecture
V.C.Ostuni etal.,SoundandMusicRecommendationwithKnowledgeGraphs.ACMTransactionson IntelligentSystemsandTechnology(TIST)– 2016– http://sisinflab.poliba.it/publications/2016/OODSD16/
ItemLinker
• DirectItemLinking• ItemDescription Linking
DirectitemLinking
dbr:I_Am_Legend_(film)
DirectitemLinking
dbr:Troy_(film)
dbr:Troy
dbr:I_Am_Legend_(film)
DirectitemLinking
dbr:Scarface_(1983_film)
dbr:Scarface:_The_World_Is_Yours
dbr:Troy_(film)
dbr:Troy
dbr:I_Am_Legend_(film)
DirectItemLinking
dbr:Divine_Comedy
DirectItemLinking
dbr:The_Da_Vinci_Code
dbr:Divine_Comedy
DirectItemLinking
???
dbr:The_Da_Vinci_Code
dbr:Divine_Comedy
DirectItemLinking
• Theeasyway
SELECTDISTINCT?uri,?title WHERE{?urirdf:type dbpedia-owl:Film.?urirdfs:label ?title.FILTERlangMatches(lang(?title),"EN").FILTERregex(?title,"matrix","i")
}
DirectitemLinking
• Other approaches– DBpedia Lookup
https://github.com/dbpedia/lookup
– Silk Frameworkhttp://silk-framework.com/
DirectItemLinking
ItemDescription Linking
ItemDescription Linking
ItemDescription Linking
ItemDescription Linking
ItemGraph Analyzer
• Build your own knowledge graph– Selectrelevant properties.Possible solutions:• Ontological properties• Categorical properties• Frequent properties• Feature selection techniques
– Explore thegraph uptoalimited depth
Which LODRSs?
• Content-based– Heuristic-based– Modelbased
• Hybrid• Knowledge-based
Commonfeatures
Linked Dataas astructuredinformationsourceforitemdescriptions
Richitemdescriptions
Different itemfeaturesrepresentations
• Directproperties• Property paths• Node paths• Neighborhoods• …
DatasetsSubsetofMovielensmappedtoDBpedia
SubsetofLast.fmmappedtoDBpedia
SubsetofTheLibraryThingmappedtoDBpedia
Mappings
https://github.com/sisinflab/LODrecsys-datasets
Directproperties
Jaccard similarity
𝑠𝑖𝑚K����w� 𝑥J, 𝑥K = |𝑁� 𝑥J ∩ 𝑁� 𝑥K ||𝑁� 𝑥J ∪ 𝑁�(𝑥K)|
Content-based prediction
𝑟� 𝑢, 𝑥K = ∑ 𝑟 𝑢, 𝑥J ⋅ 𝑠𝑖𝑚(𝑥J, 𝑥K)�<M∈,∩�w��J��(5)
∑ 𝑠𝑖𝑚(𝑥J,𝑥K)�<M∈,∩�w��J��(5)
VectorSpaceModelforLOD
RighteousKill
starringdirectorsubject/broadergenre
Heat
Robe
rtDe
Niro
John
Avn
etSeria
lkillerfilm
s
Dram
a
AlPacino
BrianDe
nneh
y
Heistfilm
sCrim
efilms
starring
Robe
rtDe
Niro
AlPacino
BrianDe
nneh
y
RighteousKillHeat
……
VectorSpaceModelforLOD
RighteousKill
STARRING AlPacino(v1)
RobertDeNiro(v2)
BrianDennehy
(v3)RighteousKill(m1) X X X
Heat(m2) X X
Heat
RighteousKill(x1) wv1,x1 wv2,x1 wv3,x1
Heat(x2) wv1,x2 wv2,x2 0
𝑤�����J��,���� = 𝑡𝑓�����J��,���� ∗ 𝑖𝑑𝑓�����J��
VectorSpaceModelforLOD
RighteousKill
STARRING AlPacino(v1)
RobertDeNiro(v2)
BrianDennehy
(v3)RighteousKill(m1) X X X
Heat(m2) X X
Heat
RighteousKill(x1) wv1,x1 wv2,x1 wv3,x1
Heat(x2) wv1,x2 wv2,x2 0
𝑤�����J��,���� = 𝑡𝑓�����J��,���� ∗ 𝑖𝑑𝑓�����J��
𝑡𝑓 ∈ {0,1}
VectorSpaceModelforLOD
+
+
+
…=
𝒔𝒊𝒎𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊, 𝒙𝒋) =𝒘𝒗𝟏,𝒙𝒊 ∗ 𝒘𝒗𝟏,𝒙𝒋 + 𝒘𝒗𝟐,𝒙𝒊 ∗ 𝒘𝒗𝟐,𝒙𝒋 + 𝒘𝒗𝟑,𝒙𝒊 ∗ 𝒘𝒗𝟑,𝒙𝒋
𝒘𝒗𝟏,𝒙𝒊𝟐 +𝒘𝒗𝟐,𝒙𝒊
𝟐 +𝒘𝒗𝟑,𝒙𝒊𝟐 � ∗ 𝒘𝒗𝟏,𝒙𝒋
𝟐 + 𝒘𝒗𝟐,𝒙𝒋𝟐 +𝒘𝒗𝟑,𝒙𝒋
𝟐�
𝜶𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈 ∗ 𝒔𝒊𝒎𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊,𝒙𝒋)
𝜶𝒅𝒊𝒓𝒆𝒄𝒕𝒐𝒓 ∗ 𝒔𝒊𝒎𝒅𝒊𝒓𝒆𝒄𝒕𝒐𝒓(𝒙𝒊, 𝒙𝒋)
𝜶𝒔𝒖𝒃𝒋𝒆𝒄𝒕 ∗ 𝒔𝒊𝒎𝒔𝒖𝒃𝒋𝒆𝒄𝒕(𝒙𝒊,𝒙𝒋)
𝒔𝒊𝒎(𝒙𝒊,𝒙𝒋)
VSMContent-based RecommenderPredicttheratingusingaNearestNeighborClassifierwhereinthesimilaritymeasureisalinearcombinationoflocalpropertysimilarities
Ifthissimilarityisgreaterorequalto0,wesuggest themoviemi totheuseru.
�̃� 𝑢, 𝑥K = ∑ 𝑟 𝑢, 𝑥J ⋅
∑ 𝛼ª ⋅ 𝑠𝑖𝑚ª(𝑥J, 𝑥K)�ª∈�
|𝑃|�<M∈�w��J��(5)
|𝑝𝑟𝑜𝑓𝑖𝑙𝑒(𝑢)|
TommasoDiNoia,RobertoMirizzi,VitoClaudioOstuni,DavideRomito,MarkusZanker.Linked OpenDatatosupportContent-based Recommender Systems.8thInternationalConferenceonSemanticSystems(I-SEMANTICS)- 2012(BestPaper Award)
VSMContent-based RecommenderWepredicttheratingusingaNearestNeighborClassifierwhereinthesimilaritymeasureisalinearcombinationoflocalpropertysimilarities
Ifthissimilarityisgreaterorequalto0,wesuggest themoviemi totheuseru.
�̃� 𝑢, 𝑥K = ∑ 𝑟 𝑢, 𝑥J ⋅
∑ 𝛼ª ⋅ 𝑠𝑖𝑚ª(𝑥J, 𝑥K)�ª∈�
|𝑃|�<M∈�w��J��(5)
|𝑝𝑟𝑜𝑓𝑖𝑙𝑒(𝑢)|
Selected properties
VSMContent-based RecommenderWepredicttheratingusingaNearestNeighborClassifierwhereinthesimilaritymeasureisalinearcombinationoflocalpropertysimilarities
Ifthissimilarityisgreaterorequalto0,wesuggest themoviemi totheuseru.
�̃� 𝑢, 𝑥K = ∑ 𝑟 𝑢, 𝑥J ⋅
∑ 𝛼ª ⋅ 𝑠𝑖𝑚ª(𝑥J, 𝑥K)�ª∈�
|𝑃|�<M∈�w��J��(5)
|𝑝𝑟𝑜𝑓𝑖𝑙𝑒(𝑢)|
heuristic-based →model-based
Property subsetevaluation
Thesubject+broadersolution isbetterthanonlysubjectorsubject+morebroaders.
Thebestsolution isachievedwithsubject+broader+genres.
Toomanybroadersintroducenoise.
Rated testitems protocol
Evaluationagainst othercontent-based approaches
Rated testitems protocol
Evaluationagainst other approaches
Rated testitems protocol
Property paths
Path-based features
Analysisofcomplexrelationsbetweentheuserpreferencesandthetargetitem
T.DiNoiaetal.,SPRank:Semantic Path-based RankingforTop-N Recommendations using Linked OpenData.ACMTransactions onIntelligent SystemsandTechnology(TIST)– 2016- http://sisinflab.poliba.it/publications/2016/DOTD16/
Datamodel
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1
ImplicitFeedbackMatrix KnowledgeGraph^S =
DatamodelImplicitFeedbackMatrix KnowledgeGraph^S =
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1
DatamodelImplicitFeedbackMatrix KnowledgeGraph^S =
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1
Path-basedfeaturesPath: acyclicsequenceofrelations(s,..rl ,..rL)
Frequencyofj-th path inthesub-graphrelatedtou andx
• Themorethepaths,themoretherelevanceoftheitem.• Differentpathshavedifferentmeaning.• Notalltypesofpathsarerelevant.
u3 si2 p2e1 p1i1 à (s,p2 , p1)
𝑤5<(𝑗) = #𝑝𝑎𝑡ℎ5<(𝑗)∑ #𝑝𝑎𝑡ℎ5<(𝑗)�K
Problemformulation
Featurevector
Setofirrelevantitemsforu
Setofrelevantitemsforu
TrainingSet
Sampleofirrelevantitemsforu
𝑋5o = 𝑥 ∈ 𝑋 �̂�5< =1}
𝑋5¯ = 𝑥 ∈ 𝑋 �̂�5< =0}
𝑋5¯∗ ⊆ 𝑋5¯
𝑤5< ∈ ℝ²
TR=⋃ < 𝑤5<, �̂�5< > 𝑥 ∈ (𝑋5o ∪ 𝑋5¯∗)}�5
u1
x1
u2
u3
x2
x3
e1
e3e4
e2
e5
u4
x4
Path-basedfeatures
wu3x1?
u1
u2
u3
e1
e3e4
e2
e5
u4
Path-basedfeatures
path(1) (s,s,s):1x1
x2
x3
x4
u1
u2
u3
e1
e3e4
e2
e5
u4
Path-basedfeatures
path(1) (s,s,s):2x1
x2
x3
x4
u1
u2
u3
e1
e3e4
e2
e5
u4
Path-basedfeatures
path(1) (s,s,s):2path(2) (s,p2,p1):1
x1
x2
x3
x4
u1
u2
u3
e1
e3e4
e2
e5
u4
Path-basedfeatures
path(1) (s,s,s):2path(2) (s,p2,p1):2
x1
x2
x3
x4
u1
u2
u3
e1
e3e4
e2
e5
u4
Path-basedfeatures
path(1) (s,s,s):2path(2) (s,p2,p1):2path(3) (s,p2,p3, p1):1
x1
x2
x3
x4
Path-basedfeatures
path(1) (s,s,s):2path(2) (s,p2,p1):2path(3) (s,p2,p3, p1):1
u1
u2
u3
e1
e3e4
e2
e5
u4
x1
x2
x3
x4
𝑤5µ<¶ 1 =25
𝑤5µ<¶ 2 =25
𝑤5µ<¶ 3 =15
Evaluationofdifferentrankingfunctions
0
0,1
0,2
0,3
0,4
0,5
0,6
given5 given10 given20 given30 given50 givenAll
recall@
5
userprofile size
Movielens
BagBoo
GBRT
Sum
Evaluationofdifferentrankingfunctions
0
0,1
0,2
0,3
0,4
0,5
0,6
given5 given10 given20 givenAll
recall@
5
userprofile size
Last.fm
BagBoo
GBRT
Sum
Comparativeapproaches
• BPRMF,Bayesian Personalized RankingforMatrixFactorization
• BPRLin,LinearModel optimized forBPR(Hybrid alg.)
• SLIM,SparseLinearMethods forTop-NRecommender Systems
• SMRMF,SoftMargin RankingMatrixFactorization
MyMediaLite
Comparisonwithotherapproaches
0
0,1
0,2
0,3
0,4
0,5
0,6
given5 given10 given20 given30 given50 givenAlluserprofile size
Movielens
SPrank
BPRMF
SLIM
BPRLin
SMRMF
precision
@5
Comparisonwithotherapproaches
0
0,1
0,2
0,3
0,4
0,5
0,6
given5 given10 given20 givenAlluserprofile size
Last.fm
SPrank
BPRMF
SLIM
BPRLin
SMRMFprecision
@5
Neighborhoods
Graph-basedItemRepresentation
TheGodfather
Mafia_films
Gangster_films
AmericanGangster
Films_about_organized_crime_in_the_United_States
Best_Picture_Academy_Award_winners
Best_Thriller_Empire_Award_winners
Films_shot_in_New_York_City
subject
subjectsubject
subject
subject
subject
subject
V.C.Ostuni etal.,SoundandMusicRecommendationwithKnowledgeGraphs.ACMTransactionson IntelligentSystemsandTechnology(TIST)– 2016– http://sisinflab.poliba.it/publications/2016/OODSD16/
Graph-basedItemRepresentation
TheGodfather
Mafia_films Films_about_organized_crime
Gangster_films
AmericanGangster
Films_about_organized_crime_in_the_United_States
Films_about_organized_crime_by_country
Best_Picture_Academy_Award_winners
Best_Thriller_Empire_Award_winners
Awards_for_best_film
Films_shot_in_New_York_City
subject
subjectsubject
broader
broader
broader
broader
broader
subject
subject
subject
subject
Graph-basedItemRepresentation
TheGodfather
Mafia_films Films_about_organized_crime
Gangster_films
AmericanGangster
Films_about_organized_crime_in_the_United_States
Films_about_organized_crime_by_country
Best_Picture_Academy_Award_winners
Best_Thriller_Empire_Award_winners
Awards_for_best_film
Films_shot_in_New_York_City
subject
subjectsubject
broader
broaderbroader
broader
broader
broader
subject
subject
subject
subject
Graph-basedItemRepresentation
TheGodfather
Mafia_films Films_about_organized_crime
Gangster_films
AmericanGangster
Films_about_organized_crime_in_the_United_States
Films_about_organized_crime_by_country
Best_Picture_Academy_Award_winners
Best_Thriller_Empire_Award_winners
Awards_for_best_film
Films_shot_in_New_York_City
subject
subjectsubject
broader
broaderbroader
broader
broader
broader
subject
subject
subject
subject
Exploitentities descriptions
h-hopItemNeighborhoodGraph
TheGodfather
Mafia_films Films_about_organized_crime
Gangster_films
Best_Picture_Academy_Award_winners Awards_for_best_film
Films_shot_in_New_York_City
subject
subjectsubject
broader
broader
broader
KernelMethodsWorkbyembeddingdata inavectorspaceandlookingforlinearpatternsinsuchspace
𝑥 → 𝜙(𝑥)
[Kernel Methods forGeneralPatternAnalysis. NelloCristianini .http://www.kernel-methods.net/tutorials/KMtalk.pdf]
𝜙(𝑥)𝜙𝑥Inputspace Feature space
WecanworkinthenewspaceFbyspecifyinganinnerproductfunctionbetweenpointsinit
𝑘 𝑥𝑖, 𝑥𝑗 =< 𝜙(𝑥𝑖), 𝜙(𝑥𝑗)>
h-hopItemEntity-basedNeighborhoodGraphKernel
Explicitcomputationofthefeaturemap
Importanceoftheentity𝑒º intheneighborhoodgraphfortheitem𝑥J
𝑘»¼ 𝑥J, 𝑥K = 𝜙»¼ 𝑥J ,𝜙»¼ 𝑥K
𝜙»¼ 𝑥J = (𝑤<M,�¶, 𝑤<M,�½, …,𝑤<M,�¾,… , 𝑤<M,�¿)
Explicitcomputationofthefeaturemap
# edges involving 𝑒º at l hops from 𝑥Ja.k.a. frequency of the entity in the item neighborhood graph
factor taking into account at which hop the entity appears
h-hopItemEntity-basedNeighborhoodGraphKernel
𝑤<M,�¾ = d𝛼� ⋅ 𝑐�ÀÁ <M ,�¾
Â
�}%
𝑘»¼ 𝑥J, 𝑥K = 𝜙»¼ 𝑥J ,𝜙»¼ 𝑥K
𝜙»¼ 𝑥J = (𝑤<M,�¶, 𝑤<M,�½, …,𝑤<M,�¾,… , 𝑤<M,�¿)
Weightscomputation
i
e1 e2
p3
p2
e4e5
p3p3
h=2
𝑐�À¶ <M ,�¶ = 2𝑐�À¶ <M ,�½ = 1𝑐�À½ <M ,�à = 1𝑐�À½ <M ,�Ä = 2
Weightscomputationexample
i
e1 e2
p3
p2
e4e5
p3p3
h=2
𝑐�À¶ <M ,�¶ = 2𝑐�À¶ <M ,�½ = 1𝑐�À½ <M ,�à = 1𝑐�À½ <M ,�Ä = 2
Informativeentityabouttheitemevenifnotdirectlyrelatedtoit
ExperimentalSettings
• TrainedaSVMRegressionmodelforeachuser
• AccuracyEvaluation:Precision,Recall
• NoveltyEvaluation:Entropy-basedNovelty (AllItemsprotocol)[thelowerthebetter]
Comparativeapproaches
•NB:1-hopitemneigh.+Naive Bayes classifier
•VSM:1-hopitemneigh.Vector SpaceModel(tf-idf)+SVMregr
•WK:2-hopitemneigh.Walk-based kernel +SVMregr
Comparisonwithotherapproaches(i)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Prec@10[20/80] Prec@10[40/60] Prec@10[80/20]
NK-bestPrec
NK-bestEntr
NB
VSM
WK
Rated testitems protocol
Comparisonwithotherapproaches(ii)
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1,8
EBN@10[20/80] EBN@10[40/60] EBN@10[80/20]
NK-bestPrec
NK-bestEntr
NB
VSM
WK
Neighborhoods (path-based)
TheFreeSound casestudy
VitoClaudioOstuni,SergioOramas,TommasoDiNoia,XavierSerra,EugenioDiSciascio.ASemanticHybridApproachforSoundRecommendation.24thWorldWideWebConference- 2015
FreeSound KnowledgeGraphItemtextual descriptions enrichment:Entity Linking tools canbeusedtoenrich itemtextual descriptionswithLOD
Explicitcomputationofthefeaturemap
# sequences and subsequences of nodes from 𝑥J to em
Normalization factor
h-hopItemNode-BasedNeighborhoodGraphKernel
𝜙»¼ 𝑥J = (𝑤<M,ª∗¶, …,𝑤<M,ª∗¾,… , 𝑤<M,ª∗¿)
𝑘»¼ 𝑥J, 𝑥K = 𝜙»¼ 𝑥J ,𝜙»¼ 𝑥K
𝑤<M,ª∗¾ = #𝑝 ∗º (𝑥J)𝑝º − 𝑝 ∗º
HybridRecommendationviaFeatureCombination
Thehybridizationsisbasedonthecombinationofdifferentdatasources
Finalapproach:collaborative+LOD+textualdescription+tags
Users who rated theitem
u1u2u3…. entity1entity2…. keyw1keyw2… tag1…
entities fromtheknowledgegraph (explicit feature mapping)
Keywords extracted fromthetextual description
tags associated totheitem
ItemFeature Vector
Accuracy
All items protocol
LongTail
AggregateDiversity
Implementation
• LODreclib – aJavalibrary tobuild aLODbasedrecommender system
https://github.com/sisinflab/lodreclib
• Cinemappy (currently foriOSonly)– acontext-awaremobilerecommender system
https://itunes.apple.com/it/app/cinemappy/id681762350?mt=8
Implementation
V.C.Ostunietal.,MobileMovieRecommendations withLinked Data.CD-ARES2013:400-415
Dataset selection
Selectthedomain(s)ofyour RS
SELECT count(?i) AS ?num ?c WHERE {
?i a ?c .FILTER(regex(?c, "^http://dbpedia.org/ontology")) .
}ORDER BY DESC(?num)
Openissues• Generalize tograph patternextraction torepresentfeatures
• Automatically select thetriples related tothedomainofinterest
• Automatically select meaningful properties torepresent items
• Analysiswithrespect to«knowledge coverage»ofthedataset– What is thebestapproach?
• Cross-domainrecommendation• Moregraph-based similarity/relatedness metrics
Does theLODdataset selectionmatter?
Phuong Nguyen,PaoloTomeo,TommasoDiNoia,EugenioDiSciascio.Content-basedrecommendationsviaDBpedia andFreebase:acasestudyinthemusicdomain.The14thInternationalSemanticWebConference- ISWC2015
Conclusions• Linked OpenDatatoenrich thecontent descriptions ofitem
• Exploitdifferent characteristcs ofthesemantic networktorepresent/learn features
• Improved accuracy• Improved novelty• Improved AggregateDiversity• Entity linking forabetter expoitation oftext-based data• Selecttherightapproach,dataset,setofproperties tobuild your RS
Not covered here
• Userprofile• Preferences• Context-aware• Knowledge-based approaches• Cross-domain• Feature selection• …
Q&A
[email protected]@TommasoDiNoia