Sem tech 2010_integrity_constraints

Preview:

DESCRIPTION

 

Citation preview

Using OWL in Closed World Applications

Evren Sirin, CTOClark & Parsia, LLC

evren@clarkparsia.com

Who are we?• Clark & Parsia is a semantic software startup 

– HQ in Washington, DC & office in Boston

• Provides software development and integration services

• Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers 

2

http://clarkparsia.com/Twitter: @candp

Some Applications• Customer and product data

– Find which customer would be interested in buying a certain product

• System and component descriptions– Configure components to build a desired system

• Workforce and employee data– Locate employees with desired expertise

• Patient history and drug data– Detect and prevent potentially harmful drug interactions

3

Common Theme• There is data and lots of it!• Adding semantics to the data helps a lot

– Some times simple taxonomies, but other times, complex ontologies

• We have complete knowledge about the domain• Errors in the data cause problems

– Failures in applications, errors in decision making, potential loss of revenue, security vulnerabilities, etc.

4

Data Validation• Fundamental data management problem

– Verify data integrity and correctness – Enforce validity of updates 

• Relevant in many scenarios– Storing data for stand-alone applications– Exchanging data in distributed settings

• Solved (to some degree) in RDBMSs– Harder to achieve as data semantics increase and/or

more expressive integrity conditions are required

5

Disclaimer• Data validity not important for every use case

– Invalid data may be fine for an application– Invalidity may even be a requirement

• Focus of this talk is cases where data consistency and integrity are crucial

6

Roadmap for an App• How to build one of these applications?

– Represent data as RDF triples• First step for accomplishing data integration and analysis

– Enrich data with more semantics (RDFS, OWL)• Infer implicit information from explicit assertions

– Ensure data validity• Detect errors in the data

– Do something cool with the data• Obviously...

7

Reasoning Example• Input ontology

# Every manager is an employeeManager subClassOf Employee# Person0853 is a managerPerson0853 type Manager

• Output inferences# Person0853 is an employeePerson0853 type Employee

Reasoning Example• Input ontology

# Every manager is an employeeManager subClassOf Employee# Person0853 is a managerPerson0853 type Manager

• Output inferences# Person0853 is an employeePerson0853 type Employee

Schema

Reasoning Example• Input ontology

# Every manager is an employeeManager subClassOf Employee# Person0853 is a managerPerson0853 type Manager

• Output inferences# Person0853 is an employeePerson0853 type Employee

Schema

Instance data

Validating RDF Data• Common misunderstanding

– RDFS/OWL is to RDF what XML Schema is to XML– Describe integrity conditions in RDFS or OWL

• Typing constraints - RDFS domain/range• Participation constraints - OWL some values restrictions• Uniqueness constraints - OWL cardinality restriction

– Use a reasoner to find inconsistencies

• Problem: Open World Assumption

9

Closed vs. Open World• Two different views on truth:

– CWA: Any statement that is not known to be true is false– OWA: A statement is false only if it is known to be false

• Used in different contexts– Databases use CWA because (typically) they contain 

complete information– Ontologies use OWA because (typically) they don't...

that is, they contain incomplete information

• Data validation results significantly different when using CWA instead of OWA

10

Typing Constraint• Only managers can supervise employees• Input ontology

o supervises domain Managero Person085 supervises Person173

OWA CWA

 Consistent true false

 Reason Infer that Person085 type Manager

Assume that Person085 type not Manager

• Each supervisor must supervise at least one employee

• Input axiomso Supervisor subClassOf supervises some Employeeo Person085 type Supervisor

OWA CWA Consistent true false

Reason Infer that Person085 supervises _:b _:b type Employee

Assume that Person085 supervises _:b does not exist

Participation Constraint

Uniqueness Constraint• Employees can have at most one supervisor• Input axioms

o supervises InverseFunctionalo Person085 supervises Person173o Person632 supervises Person173

OWA CWA Consistent true false

Reason Infer that Person085 sameAs Person632

Assume that Person085 sameAs Person632 does not hold

Workarounds for CW• Manually close the world

– Declare all individuals different from each other– Count existing property values and add a max

cardinality restriction– Make all disjointness statements explicit and add

negated types to individuals

• Drawbacks– Can be computationally expensive– Likely to be error-prone

Problem Summary• Definitions in an OWL schema may have two

purposes– Infer new statements– Check if existing statements are valid

• Using OWA for validation is undesirable – Not always but in many cases

• In a problem domain we may have:– Complete knowledge about some parts of the domain– Incomplete knowledge about the other parts

Integrity Constraint Solution

• We defined an alternative semantics for OWL– Integrity Constraint (IC) semantics use CWA– Can be combined with regular inference axioms

• Ontology developer chooses which axioms will be interpreted with...– OWA - regular OWL axiom, or– CWA - integrity constraint

IC Extension• Syntax specification

– How do we syntactically say an axiom is an IC and not a regular OWL axiom?

• Semantics specification– How do we exactly interpret an IC?

• Validation algorithm– Given the semantics how do we check for IC

violations?

IC Syntax• Similar approach to using owl:imports• Define a new annotation property in a new

namespace

Ont1 owl:imports Ont2Ont1 ic:imports IC1

• Backward compatible, requires minimum change in tools

IC Semantics• OWL semantics based on model theory

– Similar to First Order Logic– Formal, precise, and unambiguous

• IC semantics specification – Extends OWL model theory– Change couple basic definitions, everything else

follows

• Details published in technical papers– We are submitting a W3C member submission soon

Use Case: SKOS• Simple Knowledge Organization System (SKOS)• SKOS provides a model for expressing the basic

structure and content of concept schemes – Thesauri, classification schemes, subject heading lists,

taxonomies, folksonomies, etc.

• SKOS data model specification– Informal (Text): http://www.w3.org/TR/skos-reference/– Formal (OWL): http://www.w3.org/2004/02/skos/core.rdf

20

# Constraints from SKOS reference expressed as ICsskos:related propertyDisjointWith skos:broaderTransitive

# SKOS reference ontology that contains inference rulesskos:broaderTransitive Transitiveskos:broaderTransitive subPropertyOf skos:broader

# SKOS data that violates the SKOS data model[] a owl:Ontology ; owl:imports skos-reference.ttl ;                  ic:imports skos-constraints.ttl .

A skos:broader B ; skos:related C . B skos:broader C .

skos-constraints.ttl

skos-invalid.ttl

skos-reference.ttl

SKOS Example

ExplanationVIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C ASSERTED: A related C INFERRED: A broaderTransitive C ASSERTED: A broader B ASSERTED: B broader C ASSERTED: broader subPropertyOf broaderTransitive ASSERTED: broaderTransitive Transitive

22

# SKOS-XL ontology with a cardinality restrictionskosxl:Label subClassOf skosxl:literalForm cardinality 1

# SKOS data that violates the SKOS data model[] a owl:Ontology ; owl:imports skos-xl.ttl .

A skosxl:labelRelation LabelA LabelA type skosxl:Label .

skos-data.tll

skos-xl.ttl

Another SKOS Example

Result: Consistent

# SKOS data that violates the SKOS data model[] a owl:Ontology ; owl:imports skos-xl.ttl ;                  ic:imports skos-xl.ttl .

A skosxl:labelRelation LabelA LabelA type skosxl:Label .

skos-data.tll

skos-xl.ttl

Another SKOS Example

Result: IC Violation

# SKOS-XL ontology with a cardinality restrictionskosxl:Label subClassOf skosxl:literalForm cardinality 1

Linked Data Application• Large amounts of instance data• Validate before publishing/consuming LOD• Instance data + Inference axioms + Constraints

– Infer new facts using inference axioms with OWA– Validate data using constraints with CWA– Inference axioms and constraints are both expressed

in OWL

25

Validation Algorithm• An automated translation algorithm• Automatically maps an OWL IC to ...

– A SPARQL query, or– A RIF rule

• Many different implementation possibilities• Off-the-shelf tools can be used for IC validation

SPARQL Translation

SELECT * { ?x type Supervisor. NOT EXISTS { ?x supervises ?y. ?y type Employee. } }

Supervisor subClassOf supervises some Employee

RIF Translation

Forall ?x ?y ( invalid() :- And ( ?x[type -> Supervisor] Naf And ( ?x[supervises -> ?y] ?y[type -> Employee] )))

Supervisor subClassOf supervises some Employee

Solution Summary• Separate ICs from regular OWL ICs

– No new syntax– Import-based mechanism

• Alternative semantics for ICs– Extends OWL model theory– Provides the meanings of ICs formally

• Validation algorithm– Translate ICs to another formalism– SPARQL or RIF engines can be used

Performance• Using ICs can improve performance!• Expressive OWL reasoning is not easy• Profiles of OWL defined for tractable reasoning

– OWL 2 QL, OWL 2 EL, OWL 2 RL– Less expressive but more efficient

• Modeling some OWL axioms as ICs may reduce the overall expressivity

30

Prototype • Pellet IC validator

– Translates ICs into SPARQL queries automatically– Executes SPARQL queries with Pellet– Query results show constraint violations– Automatically explain constraint violations

• Free download– http://clarkparsia.com/pellet/icv

31

Code Example// create an inferencing model using Pellet reasonerInfModel dataModel = ModelFactory.createInfModel(r);

// load the schema and instance data to PelletdataModel.read( "file:data.rdf" );dataModel.read( "file:schema.owl" ); // Create the IC validator and associate it with the datasetJenaICValidator validator = new JenaICValidator(dataModel); // Load the constraints into the IC validatorvalidator.getConstraints().read("file:constraints.owl");

// Get the constraint violationsIterator<ConstraintViolation> violations = validator.getViolations();

Next Steps• W3C Member submission for IC semantics• Robust IC validator implementation

– Incremental validation– Multi-threaded validation

• Support for IC editing• Integration with PelletDb

– Scalable reasoning + validation

33

• Evren Sirin, Michael Smith, Evan Wallace Opening, Closing Worlds - On Integrity ConstraintsOWL: Experiences and Directions Workshop (OWLED '08), October 2008.

• Evren Sirin, Jiao TaoTowards Integrity Constraints in OWLOWL: Experiences and Directions Workshop (OWLED '09), October 2009.

• Jiao Tao, Evren Sirin, Jie Bao, Deborah L. McGuinnessIntegrity Constraints in OWLTo AppearThe 24th AAAIConference on Artificial Intelligence (AAAI '10), July 2010.

References

Questions