13
Meta Structure: Compu/ng Relevance in Large Heterogeneous Informa/on Networks Zhipeng Huang h@p://i.cs.hku.hk/~zphuang/

Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

MetaStructure:Compu/ngRelevanceinLargeHeterogeneousInforma/onNetworks

ZhipengHuangh@p://i.cs.hku.hk/~zphuang/

Page 2: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

Introduc/on

•  Compu/ngrelevanceonnetworks(e.g.,socialnetwork,co-authornetwork)supportsmanyapplica/ons:– similaritysearch–  recommenda/on

•  Manymeasureshavebeenstudied:–  Jaccardcoefficient,commonneighbors,shortestpath–  PageRank, PersonalizePageRank, SimRank,etc.

Page 3: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

HeterogeneousInforma/onNetwork

•  HIN:Directedgraphwithmul/plenodetypesandedgetypes.

a1 a2 a3

p1,2p1,1 p2,1 p2,2 p3,2p3,1

v1 v2 v3 v4t1 t2 t3 t4

KDD “mining” AAAIVLDB “efficient” “privacy”

AAAI’15 VLDB’15KDD’15KDD’07

ICDM “social”

ICDM’12

write publishmention

VLDB’06

author paper venue topicobject types:

edge types:

Page 4: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

MetaPath-BasedRelevanceMeasures

•  MetaPath:asequenceofnodeandedgetypes.

•  Measures:PathCount[1],PathSim[1]andPCRW[2]•  Source:Automa/callygeneratemetapath(WWW’15)•  Limita&on:Failtodiscovercommonnodes.–  Example:Aresearcherwantstosearchforsomeauthorswhohavepublishedpapersinthesamevenueandinthesametopicwithhispapers.

Page 5: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

LinearCombina/on

•  R(a1,a2)= R(a1,a2|P1)+R(a1,a2|P2)= 1+1= 2•  R(a2,a3)= R(a2,a3|P1)+R(a2,a3|P2)= 1+1= 2

a1 a2 a3

p1,2p1,1 p2,1 p2,2 p3,2p3,1

v1 v2 v3 v4t1 t2 t3 t4

KDD “mining” AAAIVLDB “efficient” “privacy”

AAAI’15 VLDB’15KDD’15KDD’07

ICDM “social”

ICDM’12

write publishmention

VLDB’06

author paper venue topicobject types:

edge types:

Page 6: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

MetaStructure

•  Apowerfulextensionofmetapath,adirectedacyclicgraph(DAG).

•  MorePowerful.–  Containmoreinforma/onthanametapath.Canexpressmoreseman/cmeaning.

•  Challenges:–  Howtodefinemeasuresbasedonmetastructure?– Morecomplexleadstohighcomputa/onalcost.–  Howtoderiveametastructure?(Notyetstudiedwell)

Page 7: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

RelevanceMeasures

•  StructCount:extensionofPathCount•  StructureConstrainedRandomWalk

•  BiasedStructureConstrainedRandomWalk,acombina/onoftheprevioustwomeasures.

)|,()|,( 0000 SyxGraphInsSyxtStructCoun =

Page 8: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

1.01.0

1.0 1.0

0.5

0.25

0.0

0.0

0.0

0.5 0.0

RecursiveTree

•  TocalculatetheBSCSErelevanceofa2anda1:

a1 a2 a3

p1,2p1,1 p2,1 p2,2 p3,2p3,1

v1 v2 v3 v4t1 t2 t3 t4

KDD “mining” AAAIVLDB “efficient” “privacy”

AAAI’15 VLDB’15KDD’15KDD’07

ICDM “social”

ICDM’12

write publishmention

VLDB’06

author paper venue topicobject types:

edge types:

Page 9: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

i-LTable

•  Indextheprobabilitydistribu/onstar/ngfromthei-thlevelofametastructure.

a1 a2 a3

p1,2p1,1 p2,1 p2,2 p3,2p3,1

v1 v2 v3 v4t1 t2 t3 t4

KDD “mining” AAAIVLDB “efficient” “privacy”

AAAI’15 VLDB’15KDD’15KDD’07

ICDM “social”

ICDM’12

write publishmention

VLDB’06

author paper venue topicobject types:

edge types:

Page 10: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

Experiment:En/tyResolu/on

•  Tofindduplicateden//esinYAGO– Barack_ObamaandPresidency_Of_Barack_Obama

•  Metric:AUC

P1 P2

Measure PathCount PCRW PathSim PathCount PCRW PathSim

AUC 0.1324 0.0120 0.0097 0.0003 0.0014 0.0002

LinearCombina/on(op/mal) MetaStructureS

Measure PathCount PCRW PathSim SC SCSE BSCSE*

AUC 0.2898 0.2606 0.2920 0.5556 0.5640 0.5640

Page 11: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

RelevanceRanking

•  WelabeltherelevanceofvenuesinDBLP_4_Area.

•  0fornotrelevant, 1forrelevantand2forstronglyrelevant.

•  Considerbothscopeandlevelofthevenues.(likeSIGMODandVLDBare2)

•  NormalizedDiscountedCumula/veGain(NDCG)

Page 12: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

RelevanceRanking

P1 P2

Measure PathCount PCRW PathSim PathCount PCRW PathSim

nDCG 0.9004 0.9047 0.9083 0.8224 0.8901 0.8834

LinearCombina/on(op/mal) MetaStructureS

Measure PathCount PCRW PathSim SC SCSE BSCSE*

nDCG 0.9004 0.9100 0.9083 0.9056 0.9104 0.9130

Page 13: Meta Structure: Compu/ng Relevance in Large Heterogeneous … · Nikos Mamoulis, Xiang Li, “Meta Structure: Compu/ng Relevance in Large Heterogeneous Informaon Networks”, SIGKDD’

Reference •  [1]SunYizhou,etal.“Pathsim:Metapath-basedtop-k

similaritysearchinheterogeneousinforma/onnetworks."VLDB’11(2011).

•  [2]Lao,Ni,andWilliamW.Cohen."Rela/onalretrievalusingacombina/onofpath-constrainedrandomwalks."Machinelearning81.1.010):53-67

•  [3]Meng,Changping,etal."Discoveringmeta-pathsinlargeheterogeneousinforma/onnetworks."WWW’15.

•  [4]ZhipengHuang,YudianZheng,ReynoldCheng,YizhouSun,NikosMamoulis,XiangLi,“MetaStructure:Compu/ngRelevanceinLargeHeterogeneousInforma/onNetworks”,SIGKDD’16