The Role of Reasoning for RDF Validation
Thomas Bosch, Gesis - Leibniz Institute for the Social Sciences
Erman Acar, University of Mannheim
Andreas Nolle, Albstadt-Sigmaringen University
Kai Eckert, Stuttgart Media University
RDF Validation
• high data quality
• XML validation
• RDF Validation Workshop
• working groups
– W3C Data Shapes Working Group
– DCMI RDF Application Profiles Task Group
• existing constaint languages
(ShEx, OWL 2, DSP, ReSh, SPIN, SPARQL, …)
Constraint Types
http://purl.org/net/rdf-validation
• database of 81 requirements on RDF validation
• based on findings of WGs and case studies
• from case studies to solutions and back
• requirements correspond to constraint types
RDF Validator
http://purl.org/net/rdfval-demo
example: disjoint classes
what is the role reasoning plays for RDF Validation?
why is reasoning beneficial for validation?
how to overcome the major shortcomings when validating?
(1) reasoning may resolve violations
Book ⊑ author.Person
Book(Huckleberry-Finn)
author(Huckleberry-Finn, Mark-Twain)
→ Person(Mark-Twain)
(2) reasoning may cause violations
Book ⊑ Publication
Publication ⊑
∃ publisher.Publisher
Book(Huckleberry-Finn)
(3) reasoning solves redundency
Publication ⊑
∃ publicationDate.xsd:date
Book ⊑ Publication
Conference-Proceeding ⊑ Publication
Journal-Article ⊑ Publication
for which constraint types reasoning may be performed
prior to validation to enhance data quality?
constraint types with and without reasoning
• 𝑹: set of constraint types with reasoning
– 43.2%
– RQL: OWL 2 QL reasoning
– RDL: OWL 2 DL reasoning
– determine if reasoning should be performed on different levels
• 𝑹: set of constraint types without reasoning
– 56.8%
constraint types with reasoning
sub-properties
editor ⊑ creator
editor (A+Journal-Volume, A+Editor)
creator (A+Journal-Volume, A+Editor)
constraint types with reasoning
property domain
∃ author.⊤ ⊑ Publication
author(Alices-Adventures-In-Wonderland,
Lewis-Carroll)
rdf:type(Alices-Adventures-In-Wonderland,
Publication)
constraint types without reasoning
literal pattern matching
ISBN a rdfs:Datatype ;
owl:equivalentClass [ a rdfs:Datatype ;
owl:onDatatype xsd:string ;
owl:withRestrictions
([ xsd:pattern "^\d{9}[\d|X]$" ])] .
Book ⊑ identifier.ISBN
constraint types without reasoning
allowed values
Book ≡ subject.
{Computer-Science, Librarianship}
How efficient in terms of runtime
validation is performed with and without reasoning?
performance in worst case
• computational complexity
• mapping to description logics
performance in worst case
validation type complexity class
𝑹 PSPACE-Complete
RQL PTIME
RDL N2EXPTIME
PTIME ⊆ PSPACE-Complete ⊆ N2EXPTIME
for which constraint types validation results differ
(1) if the CWA or the OWA and (2) if the UNA or the nUNA is
assumed?
• reasoning and validation assume different semantics
– reasoning: OWA + nUNA
– validation: CWA + UNA
• different semantics lead to different validation results
• does the constraint type depend on the CWA?
• does the constraint type depend on the UNA?
semantics
CWA dependent constraint types
minimum qualified cardinality restrictions
Publication ⊑ ≥1 author.Person
CWA independent constraint types
disjoint classes
Book ⊓ JournalArticle ⊑ ⊥
UNA dependent constraint types
functional properties
funct(title)
title(The-Adventures-of-Huckleberry-Finn,
"The Adventures of Huckleberry Finn")
title(The-Adventures-of-Huckleberry-Finn,
"Die Abenteuer des Huckleberry Finn")
UNA independent constraint types
literal value comparison
birthDate(Albert-Einstein, "1955-04-18")
deathDate(Albert-Einstein, "1879-03-14")
birthDate(Albert_Einstein, "1879-03-14")
deathDate(Albert_Einstein, "1955-04-18")
owl:sameAs(Albert-Einstein, Albert_Einstein)
• CWA dependent: 56.8%
• UNA dependent: 66.6%
evaluation results on semantics
Contributions
1. role reasoning plays for validation
2. how reasoning improves data quality
3. efficiency with and without reasoning
4. dependency on different semantics