Click here to load reader
Upload
andimou
View
126
Download
0
Embed Size (px)
Citation preview
Mapping and RDF Dataset Quality Assessment
with RDFUnit SPARQL-based test cases applied to [R2]RML mappings
Test-driven Assessment of [R2]RML
Mappings to Improve Dataset Quality
http://rml.io • http:// rdfunit.aksw.org
Anastasia Dimou1, Dimitris Kontokostas2, Markus Freudenberg2, Ruben Verborgh1,
Jens Lehmann2, Erik Mannens1, Sebastian Hellmann2, Rik Van de Walle1
…WHERE { ?resource %%P1%% ?c.
FILTER (DATATYPE(?c) != %%D1%%) }
<#Mapping>
rr:predicateObjectMap [
rr:predicate foaf:age; rr:objectMap [rml:reference “Age " ] ].
[R2]RML Mapping
Name Surname Age
Anastasia Dimou 12
Dimitris Kontokostas 15
http://example.com/
{Name}_{Surname}
foaf:Project
foaf:age "Age"
xsd:floathttp://example.com/
Anastasia_Dimou
foaf:Project
foaf:age "12"
xsd:float
http://example.com/
Dimitris_Kontokostas
foaf:Project
foaf:age "15"
xsd:float
RDF Dataset
for [R2]RML mappings for RDF dataset
… WHERE { ?resource foaf:age ?c.
FILTER (DATATYPE(?c) != xsd:int) }
… WHERE {
?resource rr:predicateObjectMap ?poMap.
?poMap rr:predicate %%P1%%;
rr:objectMap ?objM.
?objM rr:datatype ?c.
FILTER (?c != %%D1%%) }
size time #fail test cases #violations
DBPedia EN 115K 11s 1 160
DBPedia NL 53K 6s 1 124
DBLP 368 12s 2 8
Quality Assessment results for [R2]RML mappings
size time #fail test cases #violations
DBPedia EN 62M 16h 1,128 3.2M
DBPedia NL 21M 1.5h 683 815K
DBLP 12M 12h 7 8.1M
Quality Assessment results for RDF dataset
The number of errors grows
linearly in function of the number of iterations and
geometrically if multiple references and returned values
Shift the Quality Assessment from the RDF dataset
also to the Mapping definitions that generate the dataset
The number of errors grows
linearly in function of the number of applicable Term Maps.
The time to execute the assessment is significantly reduced
Same violations appear repeatedly over distinct entities.
Quality Assessment (QA) (-) is not incorporated into the publishing workflow
(-) is performed by third parties
Dataset Quality Assessment (DQA)(-) results are not incorporated into the dataset
(-) adjustments are manually, rarely applied & not at the root
(-) adjustments are overwritten when
a new version of the original data is mapped & published
Mapping Quality Assessment (MQA)(+) discover violations before they are even generated
(+) specify the origin of the violation
(+) structural adjustments can still be applied easily
(+) reduce the effort required to act upon QA results
(+) prevents same violations to appear repeatedly
within the dataset and over distinct entities
(+) prevents the generation of low quality RDF datasets
(+) uniform Mapping and Dataset Quality Assessment
The violations derive from mapping definitions
that specify how the RDF dataset will be generated
1Ghent University – iMinds – Multimedia Lab 2AKSW, University of Leipzig