Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 07 June 2016 v1.1

Preview:

Citation preview

Recommender Systemsmeet Linked OpenDataTommasoDiNoia

16thInternationalConferenceonWebEngineering June 7th,2016

tommaso.dinoia@poliba.it@TommasoDiNoia

Agenda

• Linked OpenData• What is aRecommender Systemandhowdoes it work?

• Evaluating aRecommender System• Recommender SystemsandLinked OpenData

LINKED OPENDATAAquick introduction to

LinkingOpenDatacloud diagram2014,byMaxSchmachtenberg,ChristianBizer,Anja Jentzsch andRichardCyganiak.http://lod-cloud.net/

Linked(Open)Data

Somedefinitions:

– Amethod ofpublishing dataontheWeb

– (Aninstance of)theWebofData

– Ahuge databasedistributed intheWeb

– LinkedDataistheSemanticWebdoneright

WebvsLinkedData

Web Linked Data

Analogy File System Database

Designed for Men Machines (Software Agents)

Main elements Documents Things

Links between Documents Things

Semantics Implicit Explicit

Courtesy ofProf.EnricoMotta,TheOpenUniversity,MiltonKeynes– Uk – Semantic Web:TechnologiesandApplications.

LODistheWeb

Which technologies?

Which technologies?

DataLanguage

QueryLanguage

SchemaLanguages

URI

• Every resource/entity/thing/relationisidentified bya(unique)URI

– URI:<http://dbpedia.org/resource/Lugano>– CURIE:dbr:Lugano

– URI:<http://purl.org/dc/terms/subject>– CURIE:dct:subject

Which vocabularies/ontologies?

• Most popular onhttp://prefix.cc (June 6,2016)– YAGO:http://yago-knowledge.org/resource/– FOAF:http://xmlns.com/foaf/0.1/– DBpedia Ontology:http://dbpedia.org/ontology/– DBpedia Properties:http://dbpedia.org/property/

– Dublin Core:http://dublincore.org/

Which vocabularies/ontologies?

• Most popular onhttp://lov.okfn.org (June 6,2016)– VANN:http://purl.org/vocab/vann/– SKOS:http://www.w3.org/2004/02/skos/core– FOAF– DCTERMS– DCE:http://purl.org/dc/elements/1.1/

RDF– ResourceDescription Framework

• Basicelement:triple[subject][predicate][object]

URI URIURI|Literal

"string"@lang|"string"^^datatype

RDF– ResourceDescription Framework

dbr:Lugano dbo:country dbr:Switzerland .

dbr:Lugano rdfs:label "Lugano"@en .dbr:Lugano rdfs:label "Lugano"@it .

dbr:Lugano dbo:populationTotal "67201"^^xsd:integer .

dbr:Lugano dct:subject dbc:Cities_in_Switzerland .

dbr:Lugano rdf:type yago:PopulatedPlacesOnLakeLugano.

dbr:Switzerland dbo:leaderParty dbr:Ticino_League .dbr:Switzerland dbp:neighboringMunicipalities dbr:Melide,_Switzerland .

RDF– ResourceDescription Framework

Switzerland Lugano

Melide,_Switzerland

Ticino_League

Cities_in_Switzerland

PopulatedPlacesOnLakeLugano

"Lugano"@en

"Lugano"@it

"67201"^^xsd:integer

country

leaderParty

neighboringMunicipalities

type

subject

label

populationTotal

RDFSandOWLintwo statements

dbo:country rdfs:range dbo:Country .

dbr:Lugano owl:sameAs wikidata:Lugano .

SPARQLPREFIXdbo:<http://dbpedia.org/ontology/>PREFIXrdfs:<http://www.w3.org/2000/01/rdf-schema#>PREFIXdct:<http://purl.org/dc/terms/>PREFIXdbc:<http://dbpedia.org/resource/Category:>

SELECTDISTINCT?city?nameWHERE{?citydct:subject dbc:Cities_in_Switzerland.?cityrdfs:label ?name .?citydbo:populationTotal ?population .FILTER(?population <70000).FILTER(lang(?name)='en')}

SPARQL

curl -g-H'Accept:application/json''http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=PREFIX+dbo%3A%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdfschema%23%3E+PREFIX+dct%3A%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E+PREFIX+dbc%3A%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3A%3E+SELECT+DISTINCT+%3Fcity+%3Fname+WHERE%7B%3Fcity+dct%3Asubject+dbc%3ACities_in_Switzerland.%3Fcity+rdfs%3Alabel+%3Fname.%3Fcity+dbo%3ApopulationTotal+%3Fpopulation.FILTER%28%3Fpopulation+%3C+70000%29.FILTER+%28lang%28%3Fname%29%3D%27en%27%29%7D'

RECOMMENDER SYSTEMSTheinformationoverload problem

60seconds intheWeb

Personalized InformationAccess

• Helptheuser infinding theinformationtheymight beinterested in

• Consider their preferences/pastbehaviour• Filter irrelevant information

Recommender Systems• HelpusersindealingwithInformation/ChoiceOverload• Helptomatchuserswithitems

Somedefinitions

– Initsmostcommonformulation,therecommendationproblemisreducedtotheproblemofestimatingratingsfortheitemsthathavenotbeenseenbyauser.

[G.Adomavicius andA.Tuzhilin.Toward theNextGenerationofRecommenderSystems:AsurveyoftheState-of-the-ArtandPossibleExtension.TKDE,2005.]

– RecommenderSystems(RSs)aresoftwaretoolsandtechniquesprovidingsuggestionsforitemstobeofusetoauser.

[F.Ricci,L.Rokach,B.Shapira,andP.B.Kantor,editors.RecommenderSystemsHandbook. Springer,2015.]

Theproblem

• Estimateautilityfunctiontoautomaticallypredicthowmuchauserwilllikeanitemwhichisunknowntothem.

InputSetofusers

Setofitems

Utilityfunction

𝑈 = {𝑢%,… , 𝑢(}

𝑋 = {𝑥%,… ,𝑥,}

𝑓: 𝑈×𝑋 → 𝑅

∀𝑢 ∈ 𝑈,𝑥56 = arg𝑚𝑎𝑥<∈=𝑓(𝑢, 𝑥)

Output

Theratingmatrix

5 1 2 4 3 ?2 4 5 3 5 2

4 3 2 4 1 3

3 5 1 5 2 4

4 4 5 3 5 2

TheMatrix

Titanic

Iloveshopping

Argo

LoveActually

Thehangover

Tommaso

Francesco

Vittoria

Jessica

Paolo

Theratingmatrix(inthereal world)

5 ? ? 4 3 ?2 4 5 ? 5 ?

? 3 ? 4 ? 3

3 5 ? 5 2 ?

4 ? 5 ? 5 2

TheMatrix

Titanic

Iloveshopping

Argo

LoveActually

Thehangover

Tommaso

Francesco

Vittoria

Jessica

Paolo

Howsparseis aratingmatrix?

𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 = 1 −|𝑅|

𝑋 ⋅ 𝑈

RatingsExplicit

Implicit

RatingPrediction vsRanking

Best Worst

Recommendation techniques

• Content-based• Collaborativefiltering• Demographic• Knowledge-based• Community-based• Hybrid recommender systems

CollaborativeRSCollaborativeRSsrecommenditemstoauserbyidentifyingotheruserswithasimilarprofile

RecommenderSystem

Userprofile

Users

Item7Item15Item11…

Top-NRecommendationsItem1,5Item2,1Item5,4Item10,5….

….

Item1,4Item2,2Item5,5Item10,3….

Item1,4Item2,2Item5,5Item10,3….

Item1,4Item2,2Item5,5Item10,3….

Content-basedRS

RecommenderSystem

Userprofile

Item7Item15Item11…

Top-NRecommendationsItem1,5Item2,1Item5,4Item10,5….

ItemsItem1

Item2

Item100Item’s

descriptions

….

CB-RSsrecommenditemstoauserbasedontheirdescriptionandontheprofileoftheuser’sinterests

Knowledge-basedRS

RecommenderSystem

Item7Item15Item11…

Top-NRecommendations

ItemsItem1

Item2

Item100Item’sdescriptions

….

KB-RSsrecommenditemstoauserbasedontheirdescriptionanddomainknowledgeencodedinaknowledgebase

Knowledge-base

CollaborativeFiltering

• Memory-based– Mainly based onk-NN– Does not require any preliminary modelbuildingphase

• Model-based– Learn apredictivemodelbefore computingrecommendations

User-based CollaborativeRecommendation

5 1 2 4 3 ?2 4 5 3 5 24 3 2 4 1 33 5 1 5 2 44 4 5 3 5 2

TheMatrix

Titanic

Iloveshopping

Argo

LoveActually

Thehangover

Tommaso

Francesco

Vittoria

Jessica

Paolo

𝑠𝑖𝑚 𝑢J, 𝑢K = ∑ 𝑟5M,< −𝑟5M ∗ 𝑟K,< − 𝑟5O�<∈=

∑ 𝑟5M,< −𝑟5MQ�

<∈=�

∗ ∑ 𝑟5O,< −𝑟5OQ

�<∈=

Pearson’s correlation coefficient

Rateprediction

�̃� 𝑢J , 𝑥6 = 𝑟5M +∑ 𝑠𝑖𝑚 𝑢J, 𝑢K ∗ 𝑟5O,<T −𝑟5O�5O∈,

∑ 𝑠𝑖𝑚(𝑢J, 𝑢K)�5O∈,

= 𝑋

k-Nearest Neighbors

k =5N

Aneighborhood of20to50neighbors is areasonable choice[Herlocker etal.Anempirical analysis ofdesignchoices inneighborhood-based collaborativefiltering algorithms,InformationRetrieval 5(2002),no.4,287–310.]

Item-based CollaborativeRecommendation

5 1 2 4 3 ?2 4 5 3 5 24 3 2 4 1 33 5 1 5 2 44 4 5 3 5 2

TheMatrix

Titanic

Iloveshopping

Argo

LoveActually

Thehangover

𝑠𝑖𝑚 𝑥J, 𝑥K = 𝑥J ⋅ 𝑥K

|𝑥J| ∗ |𝑥K|=

∑ 𝑟5,<M ∗ 𝑟5,<O�5

∑ 𝑟5,<MQ�

5� ∗ ∑ 𝑟5,<Q�

5�

CosineSimilarity

Rateprediction

�̃� 𝑢J, 𝑥6 = ∑ 𝑠𝑖𝑚 𝑥,𝑥′ ∗ 𝑟<,5M�<∈=WM

∑ 𝑠𝑖𝑚 𝑥,𝑥′�<∈=WM

𝑠𝑖𝑚 𝑥J, 𝑥K = ∑ 𝑟5,<M − 𝑟5X ∗ 𝑟5,<O −𝑟5X�5

∑ 𝑟5,<M −𝑟5XQ�

5�

∗ ∑ 𝑟5,<O − 𝑟5XQ

�5

Adjusted CosineSimilarity

= 𝑋5M

Tommaso

Francesco

Vittoria

Jessica

Paolo

CFdrawbacks

• Sparsity /Cold-start– Newuser– Newitem

• Greysheep problem

Content-Based RS

• Items aredescribed interms ofattributes/features

• Afinitesetofvalues is associated toeachfeature

• Itemrepresentation is a(Boolean)vector

Content-based

CB-RSstrytorecommenditemssimilar*tothoseagivenuserhaslikedinthepast

[M.deGemmis etal.RecommenderSystemsHandbook.Springer. 2015]

• Heuristic-based– Usually adopt techniques borrowed fromIR

• Model-based– Often we have amodelforeach user

(*)similarfromacontent-basedperspective

CBdrawbacks

• Contentoverspecialization• Portfolioeffect• Sparsity /Cold-start– Newuser

Knowledge-based RS

• Conversational approaches• Reasoning techniques– Case-based reasoning– Constraint reasoning

Hybrid recommender systems

[RobinD.Burke.Hybrid recommender systems:Survey andexperiments.UserModel.User-Adapt.Interact.,12(4):331–370,2002.]

WeightedThescores (orvotes)ofseveral recommendationtechniques arecombined together toproduceasinglerecommendation.

SwitchingThesystem switches between recommendationtechniques depending onthecurrent situation.

MixedRecommendations fromseveral differentrecommenders arepresented at thesame time

Feature combinationFeatures fromdifferent recommendation datasourcesarethrown together into asinglerecommendationalgorithm.

Cascade One recommender refines therecommendationsgiven byanother.

Feature augmentation Outputfromone technique is used as aninputfeaturetoanother.

Meta-levelThemodellearned byone recommender is used asinputtoanother.

EVALUATION

Dataset split

20%80%

hold-out

k-fold cross-validation

TrainingSet

TestSet(TS)

Protocols

• Rated test-items

• All unrated items:computeascoreforeveryitemnot rated bytheuser (also items notappearing intheuser testset)

Accuracy metrics forratingprediction

𝑀𝑒𝑎𝑛𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒𝐸𝑟𝑟𝑜𝑟

𝑀𝐴𝐸 =1|𝑇𝑆| c d |�̃�5,<M − 𝑟5,<M|

5,<M ∈ef

𝑅𝑜𝑜𝑡𝑀𝑒𝑎𝑛𝑆𝑞𝑢𝑎𝑟𝑒𝑑𝐸𝑟𝑟𝑜𝑟

𝑅𝑀𝑆𝐸 =1|𝑇𝑆| c d (�̃�5,<M −𝑟5,<M)Q

5,<M ∈ef

MAEandRMSEdrawback

• Not very suitable fortop-Nrecommendation– Errors inthehighest partoftherecommendationlistareconsidered inthesame wayas theones inthelowest part

Accuracy metrics fortop-Nrecommendation

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑁

𝑃5@𝑁 =|𝐿5 𝑁 ∩ 𝑇𝑆5o|

𝑁

𝑅𝑒𝑐𝑎𝑙𝑙@𝑁

𝑅5@𝑁 =|𝐿5 𝑁 ∩ 𝑇𝑆5o|

|𝑇𝑆5o|

𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝐶𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒𝐺𝑎𝑖𝑛@𝑁

𝑛𝐷𝐶𝐺5@𝑁 =1

𝐼𝐷𝐶𝐺@𝑁d

2wW,x − 1logQ(1 + 𝑘)

,

|}%

𝐿5 𝑁 is therecommendation listuptotheN-th element

𝑇𝑆5o is thesetofrelevant testitems for𝑢

𝐼𝐷𝐶𝐺@𝑁 indicates thescoreObtained byanideal rankingof𝐿5 𝑁

Is all about precision?

Is all about precision?

• Novelty– Recommend items inthelongtail

• Diversity– Avoid torecommend only items inasmallsubsetofthecatalog

– Suggest diverseitems intherecommendation list• Serendipity– Suggest unexpected but interesting items

Novelty

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 − 𝐵𝑎𝑠𝑒𝑑𝑁𝑜𝑣𝑒𝑙𝑡𝑦

𝐸𝐵𝑁5@𝑁 =− d 𝑝J ⋅ logQ𝑝J

<∈�W(,)

𝑝J = | 𝑢 ∈ 𝑈 𝑥𝑖𝑠𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡𝑡𝑜𝑢}|

|𝑈|

Diversity𝐼𝑛𝑡𝑟𝑎 − 𝐿𝑖𝑠𝑡𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦

𝐼𝐿𝐷5@𝑁 =12 ⋅

d d 1− 𝑠𝑖𝑚 𝑥J, 𝑥K

<O∈�W ,

<M∈�W �

𝐼𝐿𝐷@𝑁 =1|𝑈| ⋅

d 𝐼𝐿𝐷5@𝑁�

5∈�

𝐴𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦

𝐴𝐷𝑖𝑛@𝑁 =| ⋃ 𝐿5(𝑁)�

5∈� ||𝑋|

RECOMMENDER SYSTEMS ANDLINKED OPENDATA

Content-Based Recommender Systems

P.Lops,M.deGemmis,G.Semeraro.Content-based recommender Systems:StateoftheArtandTrends.In:P.Kantor,F.Ricci,L.Rokach,B.Shapira,editors,Recommender SystemsHankbook:AcompleteGuideforResearch Scientists&Practitioners

Content-Based Recommender Systems

P.Lops,M.deGemmis,G.Semeraro.Content-based recommender Systems:StateoftheArtandTrends.In:P.Kantor,F.Ricci,L.Rokach,B.Shapira,editors,Recommender SystemsHankbook:AcompleteGuideforResearch Scientists&Practitioners

Needofdomainknowledge!Weneedrichdescriptionsoftheitems!

Nosuggestionisavailableiftheanalyzedcontentdoesnotcontainenoughinformationtodiscriminateitemstheusermightlikefromitemstheusermightnotlike.*

(*)M.deGemmis etal.RecommenderSystemsHandbook.Springer. 2015

ThequalityofCBrecommendationsarecorrelatedwiththequalityofthefeaturesthatareexplicitlyassociatedwiththeitems.

LimitedContentAnalysis

Traditional Content-based RSs

• Baseonkeyword/attribute-baseditemrepresentations

• Relyonthequalityofthecontent-analyzertoextractexpressiveitemfeatures

• Lackofknowledgeabouttheitems

Semantics-aware approaches

TraditionalOntological/SemanticRecommenderSystemsmakeuseoflimiteddomainontologies;

WhataboutLinkedData?

UseLinkedDatatomitigatethelimitedcontentanalysisissue

• Plentyofstructureddataavailable

• NoContentAnalyzerrequired

LinkingOpenDatacloud diagram2014,byMaxSchmachtenberg,ChristianBizer,Anja Jentzsch andRichardCyganiak.http://lod-cloud.net/

Why RS+LOD

• Multi-Domainknowledge

Why RS+LOD

• Standardized (distributed)access todataPREFIXdbpedia:<http://dbpedia.org/resource/>PREFIXdbo:<http://dbpedia.org/ontology/>SELECT?actor WHERE{dbpedia:Pulp_Fiction dbo:starring ?actor .

}

PREFIXyago:<http://yago-knowledge.org/resource/>PREFIXowl:<http://www.w3.org/2002/07/owl#>PREFIXrdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIXdbpedia-owl: <http://dbpedia.org/ontology/>CONSTRUCT{?book?p?o.?bookyago:linksTo ?yagolink .

}WHERE{SERVICE<http://live.dbpedia.org/sparql>{?book rdf:type dbpedia-owl:Book .?book ?p?o.?bookowl:sameAs ?yago .FILTER(regex(str(?yago),"http://yago-knowledge.org/resource/"))

.}SERVICE<http://lod2.openlinksw.com/sparql>{?yago yago:linksTo ?yagolink .

}}

Why RS+LOD

• Semantic Analysis

Ahighlevel architecture

V.C.Ostuni etal.,SoundandMusicRecommendationwithKnowledgeGraphs.ACMTransactionson IntelligentSystemsandTechnology(TIST)– 2016– http://sisinflab.poliba.it/publications/2016/OODSD16/

ItemLinker

• DirectItemLinking• ItemDescription Linking

DirectitemLinking

dbr:I_Am_Legend_(film)

DirectitemLinking

dbr:Troy_(film)

dbr:Troy

dbr:I_Am_Legend_(film)

DirectitemLinking

dbr:Scarface_(1983_film)

dbr:Scarface:_The_World_Is_Yours

dbr:Troy_(film)

dbr:Troy

dbr:I_Am_Legend_(film)

DirectItemLinking

dbr:Divine_Comedy

DirectItemLinking

dbr:The_Da_Vinci_Code

dbr:Divine_Comedy

DirectItemLinking

???

dbr:The_Da_Vinci_Code

dbr:Divine_Comedy

DirectItemLinking

• Theeasyway

SELECTDISTINCT?uri,?title WHERE{?urirdf:type dbpedia-owl:Film.?urirdfs:label ?title.FILTERlangMatches(lang(?title),"EN").FILTERregex(?title,"matrix","i")

}

DirectitemLinking

• Other approaches– DBpedia Lookup

https://github.com/dbpedia/lookup

– Silk Frameworkhttp://silk-framework.com/

DirectItemLinking

ItemDescription Linking

ItemDescription Linking

ItemDescription Linking

ItemDescription Linking

ItemGraph Analyzer

• Build your own knowledge graph– Selectrelevant properties.Possible solutions:• Ontological properties• Categorical properties• Frequent properties• Feature selection techniques

– Explore thegraph uptoalimited depth

Which LODRSs?

• Content-based– Heuristic-based– Modelbased

• Hybrid• Knowledge-based

Commonfeatures

Linked Dataas astructuredinformationsourceforitemdescriptions

Richitemdescriptions

Different itemfeaturesrepresentations

• Directproperties• Property paths• Node paths• Neighborhoods• …

DatasetsSubsetofMovielensmappedtoDBpedia

SubsetofLast.fmmappedtoDBpedia

SubsetofTheLibraryThingmappedtoDBpedia

Mappings

https://github.com/sisinflab/LODrecsys-datasets

Directproperties

Jaccard similarity

𝑠𝑖𝑚K����w� 𝑥J, 𝑥K = |𝑁� 𝑥J ∩ 𝑁� 𝑥K ||𝑁� 𝑥J ∪ 𝑁�(𝑥K)|

Content-based prediction

𝑟� 𝑢, 𝑥K = ∑ 𝑟 𝑢, 𝑥J ⋅ 𝑠𝑖𝑚(𝑥J, 𝑥K)�<M∈,∩�w��J��(5)

∑ 𝑠𝑖𝑚(𝑥J,𝑥K)�<M∈,∩�w��J��(5)

VectorSpaceModelforLOD

RighteousKill

starringdirectorsubject/broadergenre

Heat

Robe

rtDe

Niro

John

Avn

etSeria

lkillerfilm

s

Dram

a

AlPacino

BrianDe

nneh

y

Heistfilm

sCrim

efilms

starring

Robe

rtDe

Niro

AlPacino

BrianDe

nneh

y

RighteousKillHeat

……

VectorSpaceModelforLOD

RighteousKill

STARRING AlPacino(v1)

RobertDeNiro(v2)

BrianDennehy

(v3)RighteousKill(m1) X X X

Heat(m2) X X

Heat

RighteousKill(x1) wv1,x1 wv2,x1 wv3,x1

Heat(x2) wv1,x2 wv2,x2 0

𝑤�����J��,���� = 𝑡𝑓�����J��,���� ∗ 𝑖𝑑𝑓�����J��

VectorSpaceModelforLOD

RighteousKill

STARRING AlPacino(v1)

RobertDeNiro(v2)

BrianDennehy

(v3)RighteousKill(m1) X X X

Heat(m2) X X

Heat

RighteousKill(x1) wv1,x1 wv2,x1 wv3,x1

Heat(x2) wv1,x2 wv2,x2 0

𝑤�����J��,���� = 𝑡𝑓�����J��,���� ∗ 𝑖𝑑𝑓�����J��

𝑡𝑓 ∈ {0,1}

VectorSpaceModelforLOD

+

+

+

…=

𝒔𝒊𝒎𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊, 𝒙𝒋) =𝒘𝒗𝟏,𝒙𝒊 ∗ 𝒘𝒗𝟏,𝒙𝒋 + 𝒘𝒗𝟐,𝒙𝒊 ∗ 𝒘𝒗𝟐,𝒙𝒋 + 𝒘𝒗𝟑,𝒙𝒊 ∗ 𝒘𝒗𝟑,𝒙𝒋

𝒘𝒗𝟏,𝒙𝒊𝟐 +𝒘𝒗𝟐,𝒙𝒊

𝟐 +𝒘𝒗𝟑,𝒙𝒊𝟐 � ∗ 𝒘𝒗𝟏,𝒙𝒋

𝟐 + 𝒘𝒗𝟐,𝒙𝒋𝟐 +𝒘𝒗𝟑,𝒙𝒋

𝟐�

𝜶𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈 ∗ 𝒔𝒊𝒎𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊,𝒙𝒋)

𝜶𝒅𝒊𝒓𝒆𝒄𝒕𝒐𝒓 ∗ 𝒔𝒊𝒎𝒅𝒊𝒓𝒆𝒄𝒕𝒐𝒓(𝒙𝒊, 𝒙𝒋)

𝜶𝒔𝒖𝒃𝒋𝒆𝒄𝒕 ∗ 𝒔𝒊𝒎𝒔𝒖𝒃𝒋𝒆𝒄𝒕(𝒙𝒊,𝒙𝒋)

𝒔𝒊𝒎(𝒙𝒊,𝒙𝒋)

VSMContent-based RecommenderPredicttheratingusingaNearestNeighborClassifierwhereinthesimilaritymeasureisalinearcombinationoflocalpropertysimilarities

Ifthissimilarityisgreaterorequalto0,wesuggest themoviemi totheuseru.

�̃� 𝑢, 𝑥K = ∑ 𝑟 𝑢, 𝑥J ⋅

∑ 𝛼ª ⋅ 𝑠𝑖𝑚ª(𝑥J, 𝑥K)�ª∈�

|𝑃|�<M∈�w��J��(5)

|𝑝𝑟𝑜𝑓𝑖𝑙𝑒(𝑢)|

TommasoDiNoia,RobertoMirizzi,VitoClaudioOstuni,DavideRomito,MarkusZanker.Linked OpenDatatosupportContent-based Recommender Systems.8thInternationalConferenceonSemanticSystems(I-SEMANTICS)- 2012(BestPaper Award)

VSMContent-based RecommenderWepredicttheratingusingaNearestNeighborClassifierwhereinthesimilaritymeasureisalinearcombinationoflocalpropertysimilarities

Ifthissimilarityisgreaterorequalto0,wesuggest themoviemi totheuseru.

�̃� 𝑢, 𝑥K = ∑ 𝑟 𝑢, 𝑥J ⋅

∑ 𝛼ª ⋅ 𝑠𝑖𝑚ª(𝑥J, 𝑥K)�ª∈�

|𝑃|�<M∈�w��J��(5)

|𝑝𝑟𝑜𝑓𝑖𝑙𝑒(𝑢)|

Selected properties

VSMContent-based RecommenderWepredicttheratingusingaNearestNeighborClassifierwhereinthesimilaritymeasureisalinearcombinationoflocalpropertysimilarities

Ifthissimilarityisgreaterorequalto0,wesuggest themoviemi totheuseru.

�̃� 𝑢, 𝑥K = ∑ 𝑟 𝑢, 𝑥J ⋅

∑ 𝛼ª ⋅ 𝑠𝑖𝑚ª(𝑥J, 𝑥K)�ª∈�

|𝑃|�<M∈�w��J��(5)

|𝑝𝑟𝑜𝑓𝑖𝑙𝑒(𝑢)|

heuristic-based →model-based

Property subsetevaluation

Thesubject+broadersolution isbetterthanonlysubjectorsubject+morebroaders.

Thebestsolution isachievedwithsubject+broader+genres.

Toomanybroadersintroducenoise.

Rated testitems protocol

Evaluationagainst othercontent-based approaches

Rated testitems protocol

Evaluationagainst other approaches

Rated testitems protocol

Property paths

Path-based features

Analysisofcomplexrelationsbetweentheuserpreferencesandthetargetitem

T.DiNoiaetal.,SPRank:Semantic Path-based RankingforTop-N Recommendations using Linked OpenData.ACMTransactions onIntelligent SystemsandTechnology(TIST)– 2016- http://sisinflab.poliba.it/publications/2016/DOTD16/

Datamodel

I1 i2 i3 i4

u1 1 1 0 0

u2 1 0 1 0

u3 0 1 1 0

u4 0 1 0 1

ImplicitFeedbackMatrix KnowledgeGraph^S =

DatamodelImplicitFeedbackMatrix KnowledgeGraph^S =

I1 i2 i3 i4

u1 1 1 0 0

u2 1 0 1 0

u3 0 1 1 0

u4 0 1 0 1

DatamodelImplicitFeedbackMatrix KnowledgeGraph^S =

I1 i2 i3 i4

u1 1 1 0 0

u2 1 0 1 0

u3 0 1 1 0

u4 0 1 0 1

Path-basedfeaturesPath: acyclicsequenceofrelations(s,..rl ,..rL)

Frequencyofj-th path inthesub-graphrelatedtou andx

• Themorethepaths,themoretherelevanceoftheitem.• Differentpathshavedifferentmeaning.• Notalltypesofpathsarerelevant.

u3 si2 p2e1 p1i1 à (s,p2 , p1)

𝑤5<(𝑗) = #𝑝𝑎𝑡ℎ5<(𝑗)∑ #𝑝𝑎𝑡ℎ5<(𝑗)�K

Problemformulation

Featurevector

Setofirrelevantitemsforu

Setofrelevantitemsforu

TrainingSet

Sampleofirrelevantitemsforu

𝑋5o = 𝑥 ∈ 𝑋 �̂�5< =1}

𝑋5¯ = 𝑥 ∈ 𝑋 �̂�5< =0}

𝑋5¯∗ ⊆ 𝑋5¯

𝑤5< ∈ ℝ²

TR=⋃ < 𝑤5<, �̂�5< > 𝑥 ∈ (𝑋5o ∪ 𝑋5¯∗)}�5

u1

x1

u2

u3

x2

x3

e1

e3e4

e2

e5

u4

x4

Path-basedfeatures

wu3x1?

u1

u2

u3

e1

e3e4

e2

e5

u4

Path-basedfeatures

path(1) (s,s,s):1x1

x2

x3

x4

u1

u2

u3

e1

e3e4

e2

e5

u4

Path-basedfeatures

path(1) (s,s,s):2x1

x2

x3

x4

u1

u2

u3

e1

e3e4

e2

e5

u4

Path-basedfeatures

path(1) (s,s,s):2path(2) (s,p2,p1):1

x1

x2

x3

x4

u1

u2

u3

e1

e3e4

e2

e5

u4

Path-basedfeatures

path(1) (s,s,s):2path(2) (s,p2,p1):2

x1

x2

x3

x4

u1

u2

u3

e1

e3e4

e2

e5

u4

Path-basedfeatures

path(1) (s,s,s):2path(2) (s,p2,p1):2path(3) (s,p2,p3, p1):1

x1

x2

x3

x4

Path-basedfeatures

path(1) (s,s,s):2path(2) (s,p2,p1):2path(3) (s,p2,p3, p1):1

u1

u2

u3

e1

e3e4

e2

e5

u4

x1

x2

x3

x4

𝑤5µ<¶ 1 =25

𝑤5µ<¶ 2 =25

𝑤5µ<¶ 3 =15

Evaluationofdifferentrankingfunctions

0

0,1

0,2

0,3

0,4

0,5

0,6

given5 given10 given20 given30 given50 givenAll

recall@

5

userprofile size

Movielens

BagBoo

GBRT

Sum

Evaluationofdifferentrankingfunctions

0

0,1

0,2

0,3

0,4

0,5

0,6

given5 given10 given20 givenAll

recall@

5

userprofile size

Last.fm

BagBoo

GBRT

Sum

Comparativeapproaches

• BPRMF,Bayesian Personalized RankingforMatrixFactorization

• BPRLin,LinearModel optimized forBPR(Hybrid alg.)

• SLIM,SparseLinearMethods forTop-NRecommender Systems

• SMRMF,SoftMargin RankingMatrixFactorization

MyMediaLite

Comparisonwithotherapproaches

0

0,1

0,2

0,3

0,4

0,5

0,6

given5 given10 given20 given30 given50 givenAlluserprofile size

Movielens

SPrank

BPRMF

SLIM

BPRLin

SMRMF

precision

@5

Comparisonwithotherapproaches

0

0,1

0,2

0,3

0,4

0,5

0,6

given5 given10 given20 givenAlluserprofile size

Last.fm

SPrank

BPRMF

SLIM

BPRLin

SMRMFprecision

@5

Neighborhoods

Graph-basedItemRepresentation

TheGodfather

Mafia_films

Gangster_films

AmericanGangster

Films_about_organized_crime_in_the_United_States

Best_Picture_Academy_Award_winners

Best_Thriller_Empire_Award_winners

Films_shot_in_New_York_City

subject

subjectsubject

subject

subject

subject

subject

V.C.Ostuni etal.,SoundandMusicRecommendationwithKnowledgeGraphs.ACMTransactionson IntelligentSystemsandTechnology(TIST)– 2016– http://sisinflab.poliba.it/publications/2016/OODSD16/

Graph-basedItemRepresentation

TheGodfather

Mafia_films Films_about_organized_crime

Gangster_films

AmericanGangster

Films_about_organized_crime_in_the_United_States

Films_about_organized_crime_by_country

Best_Picture_Academy_Award_winners

Best_Thriller_Empire_Award_winners

Awards_for_best_film

Films_shot_in_New_York_City

subject

subjectsubject

broader

broader

broader

broader

broader

subject

subject

subject

subject

Graph-basedItemRepresentation

TheGodfather

Mafia_films Films_about_organized_crime

Gangster_films

AmericanGangster

Films_about_organized_crime_in_the_United_States

Films_about_organized_crime_by_country

Best_Picture_Academy_Award_winners

Best_Thriller_Empire_Award_winners

Awards_for_best_film

Films_shot_in_New_York_City

subject

subjectsubject

broader

broaderbroader

broader

broader

broader

subject

subject

subject

subject

Graph-basedItemRepresentation

TheGodfather

Mafia_films Films_about_organized_crime

Gangster_films

AmericanGangster

Films_about_organized_crime_in_the_United_States

Films_about_organized_crime_by_country

Best_Picture_Academy_Award_winners

Best_Thriller_Empire_Award_winners

Awards_for_best_film

Films_shot_in_New_York_City

subject

subjectsubject

broader

broaderbroader

broader

broader

broader

subject

subject

subject

subject

Exploitentities descriptions

h-hopItemNeighborhoodGraph

TheGodfather

Mafia_films Films_about_organized_crime

Gangster_films

Best_Picture_Academy_Award_winners Awards_for_best_film

Films_shot_in_New_York_City

subject

subjectsubject

broader

broader

broader

KernelMethodsWorkbyembeddingdata inavectorspaceandlookingforlinearpatternsinsuchspace

𝑥 → 𝜙(𝑥)

[Kernel Methods forGeneralPatternAnalysis. NelloCristianini .http://www.kernel-methods.net/tutorials/KMtalk.pdf]

𝜙(𝑥)𝜙𝑥Inputspace Feature space

WecanworkinthenewspaceFbyspecifyinganinnerproductfunctionbetweenpointsinit

𝑘 𝑥𝑖, 𝑥𝑗 =< 𝜙(𝑥𝑖), 𝜙(𝑥𝑗)>

h-hopItemEntity-basedNeighborhoodGraphKernel

Explicitcomputationofthefeaturemap

Importanceoftheentity𝑒º intheneighborhoodgraphfortheitem𝑥J

𝑘»¼ 𝑥J, 𝑥K = 𝜙»¼ 𝑥J ,𝜙»¼ 𝑥K

𝜙»¼ 𝑥J = (𝑤<M,�¶, 𝑤<M,�½, …,𝑤<M,�¾,… , 𝑤<M,�¿)

Explicitcomputationofthefeaturemap

# edges involving 𝑒º at l hops from 𝑥Ja.k.a. frequency of the entity in the item neighborhood graph

factor taking into account at which hop the entity appears

h-hopItemEntity-basedNeighborhoodGraphKernel

𝑤<M,�¾ = d𝛼� ⋅ 𝑐�ÀÁ <M ,�¾

Â

�}%

𝑘»¼ 𝑥J, 𝑥K = 𝜙»¼ 𝑥J ,𝜙»¼ 𝑥K

𝜙»¼ 𝑥J = (𝑤<M,�¶, 𝑤<M,�½, …,𝑤<M,�¾,… , 𝑤<M,�¿)

Weightscomputation

i

e1 e2

p3

p2

e4e5

p3p3

h=2

𝑐�À¶ <M ,�¶ = 2𝑐�À¶ <M ,�½ = 1𝑐�À½ <M ,�à = 1𝑐�À½ <M ,�Ä = 2

Weightscomputationexample

i

e1 e2

p3

p2

e4e5

p3p3

h=2

𝑐�À¶ <M ,�¶ = 2𝑐�À¶ <M ,�½ = 1𝑐�À½ <M ,�à = 1𝑐�À½ <M ,�Ä = 2

Informativeentityabouttheitemevenifnotdirectlyrelatedtoit

ExperimentalSettings

• TrainedaSVMRegressionmodelforeachuser

• AccuracyEvaluation:Precision,Recall

• NoveltyEvaluation:Entropy-basedNovelty (AllItemsprotocol)[thelowerthebetter]

Comparativeapproaches

•NB:1-hopitemneigh.+Naive Bayes classifier

•VSM:1-hopitemneigh.Vector SpaceModel(tf-idf)+SVMregr

•WK:2-hopitemneigh.Walk-based kernel +SVMregr

Comparisonwithotherapproaches(i)

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

Prec@10[20/80] Prec@10[40/60] Prec@10[80/20]

NK-bestPrec

NK-bestEntr

NB

VSM

WK

Rated testitems protocol

Comparisonwithotherapproaches(ii)

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

1,8

EBN@10[20/80] EBN@10[40/60] EBN@10[80/20]

NK-bestPrec

NK-bestEntr

NB

VSM

WK

Neighborhoods (path-based)

TheFreeSound casestudy

VitoClaudioOstuni,SergioOramas,TommasoDiNoia,XavierSerra,EugenioDiSciascio.ASemanticHybridApproachforSoundRecommendation.24thWorldWideWebConference- 2015

FreeSound KnowledgeGraphItemtextual descriptions enrichment:Entity Linking tools canbeusedtoenrich itemtextual descriptionswithLOD

Explicitcomputationofthefeaturemap

# sequences and subsequences of nodes from 𝑥J to em

Normalization factor

h-hopItemNode-BasedNeighborhoodGraphKernel

𝜙»¼ 𝑥J = (𝑤<M,ª∗¶, …,𝑤<M,ª∗¾,… , 𝑤<M,ª∗¿)

𝑘»¼ 𝑥J, 𝑥K = 𝜙»¼ 𝑥J ,𝜙»¼ 𝑥K

𝑤<M,ª∗¾ = #𝑝 ∗º (𝑥J)𝑝º − 𝑝 ∗º

HybridRecommendationviaFeatureCombination

Thehybridizationsisbasedonthecombinationofdifferentdatasources

Finalapproach:collaborative+LOD+textualdescription+tags

Users who rated theitem

u1u2u3…. entity1entity2…. keyw1keyw2… tag1…

entities fromtheknowledgegraph (explicit feature mapping)

Keywords extracted fromthetextual description

tags associated totheitem

ItemFeature Vector

Accuracy

All items protocol

LongTail

AggregateDiversity

Implementation

• LODreclib – aJavalibrary tobuild aLODbasedrecommender system

https://github.com/sisinflab/lodreclib

• Cinemappy (currently foriOSonly)– acontext-awaremobilerecommender system

https://itunes.apple.com/it/app/cinemappy/id681762350?mt=8

Implementation

V.C.Ostunietal.,MobileMovieRecommendations withLinked Data.CD-ARES2013:400-415

Dataset selection

Selectthedomain(s)ofyour RS

SELECT count(?i) AS ?num ?c WHERE {

?i a ?c .FILTER(regex(?c, "^http://dbpedia.org/ontology")) .

}ORDER BY DESC(?num)

Openissues• Generalize tograph patternextraction torepresentfeatures

• Automatically select thetriples related tothedomainofinterest

• Automatically select meaningful properties torepresent items

• Analysiswithrespect to«knowledge coverage»ofthedataset– What is thebestapproach?

• Cross-domainrecommendation• Moregraph-based similarity/relatedness metrics

Does theLODdataset selectionmatter?

Phuong Nguyen,PaoloTomeo,TommasoDiNoia,EugenioDiSciascio.Content-basedrecommendationsviaDBpedia andFreebase:acasestudyinthemusicdomain.The14thInternationalSemanticWebConference- ISWC2015

Conclusions• Linked OpenDatatoenrich thecontent descriptions ofitem

• Exploitdifferent characteristcs ofthesemantic networktorepresent/learn features

• Improved accuracy• Improved novelty• Improved AggregateDiversity• Entity linking forabetter expoitation oftext-based data• Selecttherightapproach,dataset,setofproperties tobuild your RS

Not covered here

• Userprofile• Preferences• Context-aware• Knowledge-based approaches• Cross-domain• Feature selection• …

Q&A

TommasoDiNoiatommaso.dinoia@poliba.it@TommasoDiNoia

Recommended