View
230
Download
0
Embed Size (px)
Citation preview
HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE
TzaninaSaveta,EvangeliaDaskalaki,GiorgosFlouris,
IriniFundulakiInstituteofComputerScience–FORTH,Greece
Axel-CyrilleNgongaNgomoIFI/AKSW,UniversityofLeipzig,Germany
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 1
WhyInstanceMatching?
ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 2*AdaptedfromSuchanek&Weikumtutorial@SIGMOD2013
Differentsourcescontaindifferentdescriptionsofthesamerealworld
entity
InstanceMatchingforLinkedData
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 3
SetofRDFtriplesconstituteanRDF
graph
SparseData
Richsemanticsexpressedinterms
ofontologies
LargenumberofsourcestointegrateValue,Structure
andSemanticsHeterogeneities
*AdaptedfromSuchanek&Weikumtutorial@SIGMOD2013
Benchmarking
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 4
Instancematchinghasledtothedevelopmentofanumberofmatchingtechniquesandtools
• Howtocomparethose?• Howtoassesstheirperformance(efficiencyand
effectiveness)?• Howto“push”systemsintobecomingbetter?
• Benchmarkyoursystems!
InstanceMatchingBenchmarkComponents
• Datasets– Sourceandthetargetdatasetsthatwillbematchedtogethertofindtheentitiesthatrefertothesamerealworldobject
• Groundtruth/Goldstandard/Referencealignment– The“correctanswersheet”usedtojudgethecompletenessandsoundnessoftheresultsproducedbytheSUT
• Organizedintotestcaseseachaddressingdifferentkindofinstancematchingrequirements
• Metrics– Theperformancemetric(s)thatdeterminethesystems’efficiencyandeffectiveness
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 5
LANCE
• Anovelinstancematchingbenchmarkgenerator
• Domain-independent
• Highlyconfigurableandscalable• Standardvalue-basedandstructure-basedtestcases• Advancedsemantics-awaretestcasesconsideringOWL2
expressiveconstructs
• Richweightedgoldstandard
• Additionalmetrics:similarityscoremetric
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 6
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 7
LANCEArchitecture
Source Data
Target Data
Weighted Gold Standard
Resource Transformation
Module
RESCAL [NT12]
MATCHER SAMPLER
Weight Computation Module
Test Case Generation Parameters RDF
Repository Dat
a
Inge
stio
n M
odul
e
Initialization Module
Resource Generator
Test Case Generator SP
ARQ
L Q
uerie
s (S
chem
a St
ats)
SPAR
QL
Que
ries
(IR)
Matched Instances
Source Data
TestCases
Testcasesarebuiltusingavarietyoftransformations
• Value-basedtestcases– Transformationsofvaluesofdatatypeproperties
• Structure-basedtestcases– Transformationsofstructureofobjectanddatatypeproperties
• Semantics-awaretestcases– Transformationsattheinstancelevelconsideringtheschema
• SimpleandComplexcombinationofthethreefirstcategories
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 8
LANCEPerformanceMetrics• Averagesimilarityscore:averagedifficultyofthematchedinstances
– Benchmarkwithhighaveragesimilarityscore:matchedinstancesareeasiertofind
• Standarddeviation:spreadofsimilarityscoresforthematchedinstances– Benchmarkwithhighstandarddeviation:
• scoresarespreadoutfromtheaverage• moreheterogeneityofmatchedinstances
10/31/16 HOBBITPlenary2
Obtainamorefine-grainedunderstandingoftheIMsystem’sperformancebycomparingtheaveragestandarddeviationand
similarityscoreofthesystemandbenchmark
Experiments• EfficiencyandeffectivenessofIMsystemsusingLANCEbenchmarks– Systems:• LogMapVersion2.4[JG11](MoReReasoner[RG13])• OtO[DP12]• LIMES(EAGLEIMalgorithm[NL12])
– Datasets• LDBC’sSPIMBENCHGenerator(SemanticPublishingBenchmark)
• UOBM– MatchingTask• All5categoriesintroducedpreviously• Allinstancesweretransformed
10
SPIMBENCH:StandardMetrics
11
• LogMap– Respondwellinthevalue-basedtestcases– Reducedperformancewhenalsosemantics-awaretestcaseswereapplied
SPIMBENCH:StandardMetrics
12
• OtOandEAGLE– Givegoodresultsregardingthevalue-basedtransformations
– Reducedperformanceintheremainingcategories• EAGLEisnon-deterministicandusesunsupervisedlearning
UOBM:StandardMetrics
• LogMap1.Doesnotperformwelltoanyofthecategories2.Performancenotaffectedbythedatasetsize• OtO1.Performsbetter2.Reducedperformancewhenincreasingdatasetsize
13
SPIMBENCH:AdditionalMetrics
DistributionofsimilarityscoresforLANCEandTruePositivematchesfromIMsystemsforsemantics-awaretestcasesinthecaseofthe10Ktriplesdataset.• LogMapcanaddressdifficulttestcases• EAGLE&OtOcanaddressmostlyvalue-basedtestcases
1
10
100
0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1
log(#ofm
appings)
SimilarityScores
OtO EAGLE LogMap LANCE
14
StandardDevia8on
UOBM:AdditionalMetrics
DistributionofsimilarityscoresforLANCEandTruePositivematchesfromIMsystemsforstructure-basedtestcasesinthecaseofthe10Ktriplesdataset.• LogMapcannotaddresswellthechangeofURIsintheInstances
ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 15
1
10
100
0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9
log(#ofm
appings)
SimilarityOtO LogMap LANCE
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
OtO LogMap LANCE
LessonsLearned• DifferenttypeoftransformationsaffectIMsystem’s
performance• Thecharacteristicsofsourcedatasetsaffectthebehaviorof
IMsystems
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 16
Questions?
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 17
AcknowledgmentsThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020researchandinnovationprogrammeundergrantagreementNo688227.
10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 18
References[JG11]E.Jimenez-RuizandB.C.Grau.Logmap:Logic-basedandscalableontologymatching.InISWC,2011.[RG13]A.A.Romero,B.C.Grau,etal.MORe:aModularOWLReasonerforOntologyClassification.InORE,pages61-67,2013.[DP12]E.DaskalakiandD.Plexousakis.OtOMatchingSystem:AMulti-strategyApproachtoInstanceMatching.InCAiSE,2012.[NL12]A.-C.NgongaNgomoandK.Lyko.EAGLE:EfficientActiveLearningofLinkSpecificationsusingGeneticProgramming.InESWC,2012.
19