2015.09. - The Role of Reasoning for RDF Validation (SEMANTiCS 2015)

Preview:

Citation preview

The Role of Reasoning for RDF Validation

Thomas Bosch, Gesis - Leibniz Institute for the Social Sciences

Erman Acar, University of Mannheim

Andreas Nolle, Albstadt-Sigmaringen University

Kai Eckert, Stuttgart Media University

RDF Validation

• high data quality

• XML validation

• RDF Validation Workshop

• working groups

– W3C Data Shapes Working Group

– DCMI RDF Application Profiles Task Group

• existing constaint languages

(ShEx, OWL 2, DSP, ReSh, SPIN, SPARQL, …)

Constraint Types

http://purl.org/net/rdf-validation

• database of 81 requirements on RDF validation

• based on findings of WGs and case studies

• from case studies to solutions and back

• requirements correspond to constraint types

RDF Validator

http://purl.org/net/rdfval-demo

example: disjoint classes

what is the role reasoning plays for RDF Validation?

why is reasoning beneficial for validation?

how to overcome the major shortcomings when validating?

(1) reasoning may resolve violations

Book ⊑ author.Person

Book(Huckleberry-Finn)

author(Huckleberry-Finn, Mark-Twain)

→ Person(Mark-Twain)

(2) reasoning may cause violations

Book ⊑ Publication

Publication ⊑

∃ publisher.Publisher

Book(Huckleberry-Finn)

(3) reasoning solves redundency

Publication ⊑

∃ publicationDate.xsd:date

Book ⊑ Publication

Conference-Proceeding ⊑ Publication

Journal-Article ⊑ Publication

for which constraint types reasoning may be performed

prior to validation to enhance data quality?

constraint types with and without reasoning

• 𝑹: set of constraint types with reasoning

– 43.2%

– RQL: OWL 2 QL reasoning

– RDL: OWL 2 DL reasoning

– determine if reasoning should be performed on different levels

• 𝑹: set of constraint types without reasoning

– 56.8%

constraint types with reasoning

sub-properties

editor ⊑ creator

editor (A+Journal-Volume, A+Editor)

creator (A+Journal-Volume, A+Editor)

constraint types with reasoning

property domain

∃ author.⊤ ⊑ Publication

author(Alices-Adventures-In-Wonderland,

Lewis-Carroll)

rdf:type(Alices-Adventures-In-Wonderland,

Publication)

constraint types without reasoning

literal pattern matching

ISBN a rdfs:Datatype ;

owl:equivalentClass [ a rdfs:Datatype ;

owl:onDatatype xsd:string ;

owl:withRestrictions

([ xsd:pattern "^\d{9}[\d|X]$" ])] .

Book ⊑ identifier.ISBN

constraint types without reasoning

allowed values

Book ≡ subject.

{Computer-Science, Librarianship}

How efficient in terms of runtime

validation is performed with and without reasoning?

performance in worst case

• computational complexity

• mapping to description logics

performance in worst case

validation type complexity class

𝑹 PSPACE-Complete

RQL PTIME

RDL N2EXPTIME

PTIME ⊆ PSPACE-Complete ⊆ N2EXPTIME

for which constraint types validation results differ

(1) if the CWA or the OWA and (2) if the UNA or the nUNA is

assumed?

• reasoning and validation assume different semantics

– reasoning: OWA + nUNA

– validation: CWA + UNA

• different semantics lead to different validation results

• does the constraint type depend on the CWA?

• does the constraint type depend on the UNA?

semantics

CWA dependent constraint types

minimum qualified cardinality restrictions

Publication ⊑ ≥1 author.Person

CWA independent constraint types

disjoint classes

Book ⊓ JournalArticle ⊑ ⊥

UNA dependent constraint types

functional properties

funct(title)

title(The-Adventures-of-Huckleberry-Finn,

"The Adventures of Huckleberry Finn")

title(The-Adventures-of-Huckleberry-Finn,

"Die Abenteuer des Huckleberry Finn")

UNA independent constraint types

literal value comparison

birthDate(Albert-Einstein, "1955-04-18")

deathDate(Albert-Einstein, "1879-03-14")

birthDate(Albert_Einstein, "1879-03-14")

deathDate(Albert_Einstein, "1955-04-18")

owl:sameAs(Albert-Einstein, Albert_Einstein)

• CWA dependent: 56.8%

• UNA dependent: 66.6%

evaluation results on semantics

Contributions

1. role reasoning plays for validation

2. how reasoning improves data quality

3. efficiency with and without reasoning

4. dependency on different semantics

Recommended