75
A little more semantics goes a lot further! Getting more out of Linked Data with OWL Dr. Michel Dumontier Dr. Robert Hoehndorf

A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Embed Size (px)

DESCRIPTION

This tutorial will provide detailed instruction to create and make use of formalized ontologies from linked open data for advanced knowledge discovery including consistency checking and answering sophisticated questions.Automated reasoning in OWL offers the tantalizing possibility to undertake advanced knowledge discovery including verifying the consistency of conceptual schemata in information systems, verifying data integrity and answering expressive queries over the conceptual schema and the data. Given that a large amount of structured knowledge is now available as linked data, the challenge is to formalize this knowledge iso that intended semantics become explicit and that the reasoning is efficient and scalable. While using the full expressiveness of OWL 2 yields ontologies that can be used for consistency verification, classification and query answering, use of less expressive OWL profiles enable efficient reasoning and support different application scenarios. In this tutorial, - we describe how to generate OWL ontologies from linked data- check consistency of knowledge- automatically transform ontologies into OWL profiles- use this knowledge in applications to integrate data and answer sophisticated questions across domains.- expressive ontologies enables data integration, verifying consistency of knowledge and answering questions- formalization of linked data will create new opportunities for knowledge discovery- OWL 2 profiles support more efficient reasoning and query answering procedures- recent technology facilitates the automatic conversion of OWL 2 ontologies into profiles- OWL ontologies can dramatically extend the functionality of semantically-enabled web sites

Citation preview

Page 1: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

A little more semantics goes a lot further!Getting more out of Linked Data with OWL

Dr. Michel DumontierDr. Robert Hoehndorf

Page 2: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

AbstractThis tutorial will provide detailed instruction to create and make use of formalized ontologies from linked open data for advanced knowledge discovery including consistency checking and answering sophisticated questions.

Automated reasoning in OWL offers the tantalizing possibility to undertake advanced knowledge discovery including verifying the consistency of conceptual schemata in information systems, verifying data integrity and answering expressive queries over the conceptual schema and the data. Given that a large amount of structured knowledge is now available as linked data, the challenge is to formalize this knowledge iso that intended semantics become explicit and that the reasoning is efficient and scalable. While using the full expressiveness of OWL 2 yields ontologies that can be used for consistency verification, classification and query answering, use of less expressive OWL profiles enable efficient reasoning and support different application scenarios. In this tutorial,- we describe how to generate OWL ontologies from linked data- check consistency of knowledge- automatically transform ontologies into OWL profiles- use this knowledge in applications to integrate data and answer sophisticated questions across domains.- expressive ontologies enables data integration, verifying consistency of knowledge and answering questions- formalization of linked data will create new opportunities for knowledge discovery- OWL 2 profiles support more efficient reasoning and query answering procedures- recent technology facilitates the automatic conversion of OWL 2 ontologies into profiles- OWL ontologies can dramatically extend the functionality of semantically-enabled web sites

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 2

Page 3: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

skills obtained• understand the nature and capability of a formal ontology and information system• understand the subtle differences between OWL2 and its profiles, including difference in

constructs, when to apply these profiles and how to convert ontologies in this format• understand the distinction between a class and an individual and their descriptions• understand how to convert RDF triples in Linked Data into axioms for an OWL ontology• understand how to execute standard reasoning services (classification, consistency

checking, realization, query answering) on an OWL ontology using the OWL API and an OWL reasoner, with focus on OWL-EL ontologies and reasoners.

• understand how to identify inconsistencies and simple patterns to remove or repair them• Understand how to convert large amounts of linked data into a large scale OWL

knowledge base and enable tractable reasoning over it

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 3

Page 4: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

90 min Outline1. introduction (10min)• case study: SGD• linked data vs ontology• RDF vs OWL• Motivation: can we use some features of OWL to organize, verify and exploit Linked Data?

2. Formalization• OWL2 – elements, expressions and axioms• Triples to axioms• Role of top level ontologies (classes + relations)• Axiom patterns

3. Practical Reasoning• classification using CEL/CB/Pellet/HermiT/...• OWL profiles• Modularization (EL Vira)• Diagnosis and Repair• Explanations• Inference of new triples

4. Conclusion

4OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 5: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Saccharomyces Genome Database

A repository for all things yeast.

includes :• molecular entities, their parts

o chromosomes; genes, open reading frames, etco rna, proteins; domains

• qualities, realizables (dispositions, functions)• interactions and their participants• complexes, their parts and their topology• pathways and their components• phenotypes and their basis

5OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 6: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Hexokinase (HXK1)

The HXK1 gene encodes the HXK1 protein - which is responsible for the conversion of glucose to glucose-6-phosphate in the first step of glycolysis.

Gene:(region of DNA)

Protein(macromolecule)

6OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 7: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Questions we may want to ask about HXK1:

• What kind of thing is HXK1?• What are the implications of being a gene?

o In which chromosome does it appear?o Which entities does it encode?

• What are the implications of being a protein?o What is its function?o Where is it located in the cell?o If HXK1 participates in processes that involves other

cellular components, where else must HXK1 be located?• Is HXK1 annotation consistent?

o does the annotation contradict common biological knowledge?

o Is it possible for HXK1 to have multiple locations when it can only be located on one chromosome?

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 7

Page 8: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

SGD refers to other data sources

Gene Ontology- functions, locations, processes

Ascomycetes Phenotype Ontology- experiments, interactions and phenotypes

Pubmed- abstracts of published research articles + MeSH terms

over 40 references to other molecular/data entities for which the relation is unclear…

8OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 9: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Bio2RDF’s RDFized data fits together

9syntactic integration OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 10: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

SGD as RDF-based Linked Open Data

10OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 11: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

SGD is provided by Bio2RDF and forms part of the growing linked open data cloud

11OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 12: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Semantic Integration

• Requires a level of abstraction/generalization where the relationship between each resource is formalized– classes– relations

• How do we ensure that our representation facilitates integration across datasets?

• How can we get our formalization to interoperate with ontologies?

12OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 13: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Early conceptualization

13OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 14: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

More advanced conceptualization

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 14

Page 15: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Semantic Technologies: RDF vs OWL

RDF: simple triples, graph-based queries, supports very large amount of data

OWL: significantly more expressive language, strong axioms, inference capabilities, consistency verification, but can be rather slow

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 15

Page 16: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

RDF-based Linked Data

• Provides the basis for simple data syndication and syntactic data integrationo IRIso Statements (aka triples) take the form ofo <subject> <predicate> <object>

• Easy to implemento stand-alone datasetso logical layer over databases

• Limited reasoningo class and property hierarchieso domain/range restrictionso can’t automatically discover inconsistency

• Standardized Queries - SPARQL• Scalable - to billions of triples

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 16

Page 17: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

OWL - The Web Ontology Language

• Enhanced vocabulary (strong axioms) to express knowledge relating to classes, properties, individuals and data valueso quantifiers (existential, universal, cardinality restriction)o negationo disjunctiono property characteristicso complex classes in domain and range restrictions o property chains

• Advanced reasoning

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 17

Page 18: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Advanced Reasoning

• Consistency: determines whether the ontology contains contradictions.

• Satisfiability: determines whether classes can have instances.

• Subsumption: is class C1 implicitly a subclass of C2?

• Classification: repetitive application of subsumption to discover implicit subclass links between named classes

• Realization: find the most specific class that an individual belongs to.

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 18

Page 19: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

OWL Challenges and Solutions

Inconsistency:• needs to be resolved to ask any questions involving the

ontology• Solution: explicitly accommodate multiple meanings,

remove contradictory axioms

Unsatisfiability (of a class):• may indicate a modelling error• needs to be resolved to ask meaningful questions about the

class• Solution: explicitly accommodate multiple meanings,

redefine class, remove contradicting class restrictions

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 19

Page 20: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

OWL Challenges and Solutions

Scalability:• answers to OWL queries requires reasoning• inference in OWL is highly complex (worst case: 2

NEXPTIME)• highly optimized reasoners are getting better and better, but

can still be slow with large ontologies• tractable OWL profiles (EL, QL, RL) enable more efficient

and guaranteed polynomial-time inferences• use ontology modularization approaches to increase

performance

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 20

Page 21: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Linked data and OWL: Motivation

• use OWL reasoning to identify mistakes in RDF datao incorrect content of assertionso incorrect use of relationso conflicting conceptualizationso incorrect same-as assertions

• verify, fix and exploit Linked Data through expressive OWL reasoning

• generate/infer new triples to write back into RDF and use for efficient retrieval

Proposal:Convert RDF to OWL to perform inferences and represent inferences in RDF after classification.

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 21

Page 22: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

OWL can help you create rich, machine-understandable descriptions!• transform our expert knowledge into axioms and

expressions that can be automatically reasoned abouto a transcription factor isa protein that binds to DNAand regulates the expression of a gene.

o can we mine 'omic datasets to discover which proteins are transcription factors?

• create rich expressions from combinations of classes, relations and individuals

• assert statements of truth using axioms.

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 22

Page 23: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Elements of OWL 2.0• The “ontology” of OWL 2 consists of:

• Classes• Object properties• Data properties• Individuals• Expressions• Axioms• Plus RDF stuff (like datatypes)

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 23

Page 24: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Classes and class axioms

• a class is a set of individuals that share one or more characteristicso a protein

• classes can be organized in a hierarchy using subClassOf axiomso i.e. every member of C2 is a member of C1o subClassOf (protein molecule)

• special classeso owl:Thing is the superclass of all thingso owl:Nothing is the subclass of all things, denotes an empty set

• classes can be made disjoint from one anothero i.e. there is no member of C1 that is also a member of C2o disjointClasses (protein DNA )

• classes can be said to be equivalento i.e. all members of C1 are members of C2 and all members of C2

are members of C1o EquivalentClass (Peptide Polypeptide )

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 24

Page 25: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Object Properties and axioms

• an object property OP is a relation between two individualso 'has part' is an object property that denotes the mereological

relation between two individuals• OPs can be organized in a hierarchy

o given OP1 and OP2 and OP2 is a subproperty of OP1 then if an individual x is connected by OP2 to an individual y, then x is also connected by OP2 to y.

o subPropertyOf ('has proper part' 'has part')o owl:TopObjectProperty, owl:BottomObjectProperty

• We can restrict the domain and range to allowed values• ObjectPropertyDomain ('is participant in', 'process')• ObjectPropertyRange ('is participant in', 'physical entity')• We can also assert objects to be disjoint or equivalent

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 25

Page 26: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

description of object properties• Inverse

o we say that 'has part' is an inverse for 'is part of'o we can also refer to this as inv('is part of')

• Symmetrico to cases where the inverse relation is the very same relationo e.g. the inverse for 'is related to' is 'is related to‘

• Transitiveo a transitive relation if individual x is connected to an individual y

that is connected by to an individual z, then x is also connected by to z

o e.g. 'has part' is transitive

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 26

Page 27: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

description of object properties• Reflexive

o reflexive infers that the relation automatically refers back to the individual

o e.g. 'has part' is reflexive because protein has itself as a part.

• Functionalo restrict the range of the relation to a single individual, and therefore

all individuals in the range must be the same.o e.g. 'has unique identifier‘

• Inverse Functionalo restrict the domain of the relation to a single individual, therefore all

individuals in the domain must be the sameo e.g. 'is unique identifier of'

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 27

Page 28: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Class ExpressionsClass expressions are rich descriptions of classes through the logical combination of ontological primitives (classes, object properties, datatype properties, individuals)

Protein subClassOfmolecule and ‘has proper part’ min 2 ‘amino acid residues’

Combinations specified using logical operators• conjunction (and), disjunction (or), negation (not)

Object or data property expressions provide a qualified cardinality over the relation

o minimum: rel min # Y o maximum: rel max # Y

o exact: rel exactly # Y (minimum + maximum)o some: rel min 1 Y

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 28

Page 29: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Class Expressionso The quantifications can qualified by the object type

o rel only Y – the only values allowed are of type Y

• To form complex class expressions likeo 'molecule' and not 'dna'o 'has part' min 2 'amino acid'o 'is located in' only ('nucleus' or 'cytoplasm')

• and be expressed as axioms in the ontologyProtein subClassOf

molecule and ‘has proper part’ min 2 ‘amino acid residues’

Transcription Factor equivalentClass‘protein’and ‘has disposition’ some ‘to bind to DNA’and ‘has function’ some ‘to regulate gene expression’

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 29

Page 30: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Triples to axioms

Convert RDF triples into OWL axioms.

Triple in RDF:<Nucleus> <partOf> <Cell>

• Nucleus and Cell are classes• partOf is a relation between 2 classes• intended meaning:

every instance of Nucleus is partOf some instance of Cell

• formalize as OWL axiom:

Nucleus subClassOfpartOf some Cell

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 30

Page 31: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Triples to axioms

Triple in RDF:<Cytosol> <isLocationOf> <HXK1>

• Cell and HXK1 are classes• isLocationOf is an axiom pattern involving 2 classes• intended meaning:

• every instance of HXK1 is located at some instance of Cytosol• not intended:

• for every instance of Cytosol, there is an instance of HXK1 located in it.

HXK1 subClassOfhasLocation some Cytosolinv(isLocationOf) some Cytosol

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 31

Page 32: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Triples to axioms

Convert RDF triples into OWL axioms.

Triple in RDF:<C1 R C2>• C1 and C2 are classes, R a relation between 2 classes• intended meaning:

o C1 SubClassOf: C2o C1 SubClassOf: R some C2o C1 SubClassOf: R only C2o C2 SubClassOf: R some C1o C1 SubClassOf: S some C2o C1 DisjointFrom C2 o C1 and C2 SubClassOf: owl:Nothingo R some C1 DisjointFrom: R some C2o C1 EquivalentClasses C2o ...

• in general: P(C1, C2), where P is an OWL axiom (template)OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 32

Challenge:Formalizing data requires

one to commit to a particular meaning – to

make an ontological commitment

Page 33: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Triples to axioms

Formalizing RDF triples in OWL often introduces new OWL object properties.

• Which object properties should be included?• What axioms hold for included object properties?• Can domain and range restrictions be generalized across

multiple domains, i.e., reused across multiple linked data sources to ensure consistency between them?

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 33

Challenges

Page 34: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Top level ontologies contain generalized (domain independent)

classes and relations

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 34

They can be used to constrain what can be said about these entities (and hence will later be useful for checking the consistency of data annotated using these terms).

Page 35: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Basic classes in top-level ontologies

• Material entity• Example: Apple, Human, Cell, Planet• Has mass as an quality• Located in space and time• Independent of other entities• it exists in whole whenever it exists

• Quality• Example: mass, color, concentration• Dependent: always the quality of some entity• Quality of object: size, shape, length• Quality of process: duration, rate• Quality of quality: shade (of color), intensity

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 35

Page 36: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Basic classes in top-level ontologies

• Function• e.g. to bind, to catalyze (a reaction), to kill bacteria• Dependent: always the function of some thing• Similar to a property of an object• Represents the potential to do something (an action) in

some process• capabilities, dispositions and tendencies

• Process• Example: running a marathon, binding, cell division• Located in space and time• Independent of other entities• Temporally extended

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 36

Page 37: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Top-level ontologies make a commitment to these being different

things

Material object, Process, Function and Quality are mutually disjoint.

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 37

Page 38: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Basic Relations in Top Level Ontologies

• Mereological: parthood– ‘has part’, ‘has proper part’, ‘has component part’

• Participatory– ‘is participant in’, ‘is agent in’, ‘is target in

• Topology– ‘is connected to’, ‘located in’, ‘contains’, ‘is adjacent to’

• Temporal– ‘derives from’, ‘precedes’, ‘meets’, ‘overlaps’, etc

• Referential– ‘describes’, ’references’, ‘represents’

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 38

Page 39: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Relations in top-level ontologies

• relations (object properties) in OWL hold between instances

• domain and range restrictions from top-level ontology can be applied for general relations, e.g.:o ‘has part’ can be restricted with "Material object" as

both domain and rangeo ‘participates in’ can be restricted with a domain of

"Material object" and a range of "Process“o re-use of relations enables inferences across

resources

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 39

Page 40: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Enforce ontological commitment by mapping to a top-level ontologyFoundation of domain classes and relations in top-level ontology:• every domain class becomes a subclass of a class in top-

level ontology• every object property used in OWL axioms becomes a sub-

property of an object property in the top-level ontology• assert additional axioms to restrict domain classes and

delimit it from other domains (where appropriate)o e.g., if a particular resources uses (in RDF) the relation

part-of exclusively between processes, the additional constraint can be added

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 40

Page 41: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Top-level ontology

Application of a top-level ontology:

• can help to make the ontological commitment that is employed within an information system explicit,

• can guarantee basic agreement about fundamental types,• agreement about common relations, • provides common domain and range restrictions across

multiple domains, and therefore• enables re-use of relations and types across data sources,

domains, levels of granularities, information systems.

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 41

Page 42: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Formalization of SGD’s Linked Data

SGD uses at least the following relations in RDF:

• isPartOf• hasParticipant• isFunctionOf• isLocationOf

Can we create patterns from which linked data can be appropriately formalized into OWL axioms?

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 42

axiom patterns

Page 43: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Formalization of SGD Linked Data

?X isPartOf ?Y

Can be translated to axiom pattern

?X subClassOf: part-of some ?Y

"part-of" is an object property contained in our top-level ontology.

Example:HXK1 isPartOf chromosome6_Cricktranslated toHXK1 subClassOf: part-of some chromosome6_Crick

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 43

Page 44: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Formalization of SGD Linked Data

?X hasParticipant ?Y

translated to axiom pattern

?Y subClassOf: participates-in some ?X

"participates-in" is an object property contained in our top-level ontology.

Example:GO:0005975 (carbohydrate metabolism) hasParticipant HXK1translated toHXK1 subClassOf: participates-in some GO:0005975

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 44

Page 45: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Formalization of SGD Linked Data

?X isLocationOf ?Y

translated to axiom schema

?Y subClassOf: located-in some ?X

Example:GO:0005737 (cytoplasm) isLocationof HXK1translated toHXK1 subClassOf: located-in some GO:0005737

What if "located-in" is not present in our top-level ontology…

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 45

Page 46: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Formalization of SGD Linked Data

Top-level foundation for located-in relation:

• declare located-in as sub-property of part-ofo verify how located-in is used within SGD, i.e., does

located-in imply part-of? o counter-example: misfolded protein located-in chaperone

protein, but not misfolded protein part-of chaperone protein

• create located-in as super-property of part-of in our top-level ontology:o does part-of imply located-in within SGD?o cell body part-of cell, but not cell body located-in cell

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 46

Page 47: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Formalization of SGD Linked Data

Top-level foundation for located-in relation:

• add located-in to our top-level ontologyo adding the new relation allows its reuse across multiple

resourceso inclusion may require addition of further classes (e.g.,

spatial regions)o relation to part-of must be clarified (and part-of may even

be replaced by located-in)

Establishing the relation between relations and classes depends on how the relations and classes are being applied.

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 47

Page 48: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Formalization of SGD Linked Data

Top-level foundation:

TranslateHXK1 rdf:type OpenReadingFrametoHXK1 subClassOf: OpenReadingFrame

OpenReadingFrame (Sequence Ontology) is a subclass of Sequence.

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 48

Page 49: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

49OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 50: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Formalization of SGD Linked Data

Foundation for SGD classes in top-level ontology: • declare Sequence to be a subclass of Material object• import (owl:imports) Sequence Ontology• declare Biological Process (GO) subclass of Process • declare Molecular Function (GO) subclass of Function• import GO• ...

to create a top-level foundation (i.e., super-class in top-level ontology for all classes) for SGD

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 50

Page 51: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Implementation

• expand relations in RDF based on relational patterns• relational patterns are OWL axioms with 2 variables (which

are filled by subject and object, respectively)• implementation based on OWL API• adopt implementation of relational patterns in OBO

language (http://code.google.com/p/obo2owl/)

Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Kelso, Janet, Herre, Heinrich, and Rebholz-Schuhmann, Dietrich (2010). Relational patterns in OWL and their application to OBO. OWL: Experiences and Directions (OWLED). paper: http://www.webont.org/owled/2010/papers/owled2010_submission_3.pdfpresentation: http://www.slideshare.net/micheldumontier/relational-patterns-in-owl-and-their-application-to-obo

BMC Bioinformatics: http://www.biomedcentral.com/1471-2105/11/441

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 51

Page 52: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Another way?

• OPPL is an abstract formalism that allows for manipulating ontologies written in OWL.

• Use OPPL to select triples and create the axioms

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 52

Page 53: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Operations on OWL ontologies

• Consistency: determines whether the ontology contains contradictions.

• Satisfiability: determines whether classes can have instances.

• Subsumption: is class C1 implicitly a subclass of C2?• Classification: repetitive application of subsumption to

discover implicit subclass links between named classes• Realization: find the most specific class that an individual

belongs to.

OWL reasoners can perform these operations and make the results accessible for further processing.

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 53

Page 54: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Practical reasoning with OWL ontologies• Ontology editors such as Protege interface with reasoners to

perform consistency and class satisfiability, classification, realisation, and provide explanations.

• Some reasoners are setup to be used as the command line to execute requests including SPARQL querying.

• Programmatic use of reasoners via APIs. Maximal flexibility, e.g., one can request all subclasses of a given class, including implicit once, or all entailed statements with a specified subject and predicate

54OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 55: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

55OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

OWLAPI

Page 56: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Classifying the ontology

56OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 57: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Classifying the ontology

57OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 58: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

OWL Reasoners

OWL DL Reasoners• Pellet: Clark & Parsia, dual-licensed, Java.• Fact++: Manchester University, open-source, C++ with a Java API.• HermiT: Oxford University, open-source, Java.• Racer Pro: Racer Systems, commercial, Lisp with a Java API.

OWL Profile/subset reasoners• Jena: Hewlett-Packard, open-source, Java.• OWLIM: Ontotext, dual-licensed, Java.• CB:• CEL:• JCEL (Pellet)• ELLY:

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 58

Page 59: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Automated reasoning over SGD

• SGD in OWL contains more than 800,000 axioms• included ontologies contains several thousand axioms

o GO has approx. 35,000 classeso ChEBI contains almost 100,000 classes o complex definitions of classes create links between large

ontologies• Reasoning in OWL 2 DL is highly complex (worst-case

2NEXPTIME complete).• Consequence: OWL reasoning can rarely be employing in a

large scale.• Expressive OWL reasoners do not classify the formalized

SGD repository.

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 59

Page 60: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

OWL Profiles

• OWL 2 defines three different tractable profiles:• EL

o polynomial time reasoning for schema and datao Useful for ontologies with large conceptual part

• QLo fast (logspace) query answering using RDBMs via SQLo Useful for large datasets already stored in RDBs

• RLo fast (polynomial) query answering using rule-extended

DBso Useful for large datasets stored as RDF triple

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 60

Page 61: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

OWL RL

Features:• identity of classes, instances, properties• subproperties, subclasses, domains, ranges• union and intersection of classes (some restrictions)• property characterizations (functional, symmetric, etc)• property chains• keys• some property restrictions (but not all inferences are

possible)Limitations:• not all datatypes are available• no datatype restrictions• no minimum or exact cardinality restrictions• maximum cardinality only with 0 and 1• some consequences cannot be drawn

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 61

Page 62: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

OWL EL

Features• existential quantification to a class expression or data range• existential quantification to an individual or a literal• self-restriction• enumerations involving a single individual or a single literal • intersection of classes and data range• class axioms: subClassOf, equivalence, disjointness• property axioms: domain, range, equivalence, transitive, reflexive, inclusion with

or without property chains; functional data properties. keys.• assertions (sameAs, DifferentFrom, Class, Object Property, Data Property,

Negative Object/Data PropertyNot supported• universal quantification to a class expression or a data range• cardinality restrictions• disjunction (union)• class negation• enumerations involving more than one individual• object properties: disjoint, symmetric, asymmetric, irreflexive, inverse, functional

and inverse-functional

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 62

Page 63: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Ontology modularization

Can we automatically extract a large (maximal) OWL (EL, QL, RL) module from an ontology?

1. D EquivalentTo: not A (not EL)2. C EquivalentTo: not B (not EL)3. B subClassOf: A (EL)

Inference: • D subClassOf: C (EL) (Inference from (1)-(3))

EL module of (1)-(3):• {B subClassOf: A}, or• {B subClassOf: A, D subClassOf: C}

63OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial

Page 64: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

EL Vira modularization

• ontology modularization• identify EL, QL, RL axioms in deductive closure• retain signature of ontology• maximality is an open problem

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 64

http://el-vira.googlecode.com

Page 65: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Consistency repair

• Unsatisfiable classes result from contradictory class definitions

• Conflict in asserted axioms, in imported ontologies or through combination of both

• Conflicts can be hidden through domain/range restrictions, subclass relations, axioms for relations, etc.

• Conflicting axioms may be challenging to identify!

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 65

Page 66: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Protege 4: Explanations

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 66

Page 67: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Consistency repair

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 67

Page 68: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Ontology repair and disambiguation

• Ontological constraints may have been too strong• Complex relations (between classes) that are used in

multiple meanings can be relaxed by explicitly introducing a disjunction that accommodates the different meanings, e.g.:

o (1) Hxk1 part-of Chromosome6_Crick_strando (2) Hxk1 part-of Hxk1_ATP_complexo (3) Hxk1 part-of Carbohydrate_metabolismo only (1) is consistent with background knowledge that

Genes (as material objects) must be part of material objects (more specifically DNA), and that Genes cannot be part of protein complexes

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 68

Page 69: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Ontology repair and disambiguation

1. Hxk1 part-of Chromosome6_Crick_strand2. Hxk1 part-of Hxk1_ATP_complex3. Hxk1 part-of Carbohydrate_metabolism

part-of here means either?X subClassOf: part-of some ?Y, or?X subClassOf: encodes some (part-of some ?Y), or?X subClassOf: participates-in some ?Y, or?X subClassOf: encodes some (participates-in some ?Y)

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 69

Page 70: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Ontology repair and disambiguation

?X subClassOf: part-of some ?Y, or?X subClassOf: encodes some (part-of some ?Y), or?X subClassOf: participates-in some ?Y, or?X subClassOf: encodes some (participates-in some ?Y)

All four interpretations are disjoint! Create new interpretation for part-of:

?X subClassOf:part-of some ?Y orencodes some (part-of some ?Y) orparticipates-in some ?Y orencodes some (participates-in some ?Y)

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 70

Page 71: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Inference of revised RDF representation• Query OWL ontology for relational patterns that were used

in relation expansion• generates deductive closure of a set of RDF triples with

respect to inferences in OWL• naive implementation:

o given a pattern P(?X, ?Y), substitute all combination of named classes for ?X and ?Y

o runtime: n*no more efficient implementation work-in-progress

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 71

Page 72: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Inference of revised RDF representationIn the definition: ?X subClassOf:

part-of some ?Y orencodes some (part-of some ?Y) orparticipates-in some ?Y orencodes some (participates-in some ?Y)

one or more of the classes in the disjunction may become unsatisfiable! • reasoner can be used to decide which interpretation is

correct• eliminate remaining interpretations• useful to "split" relations in RDF that have multiple

conflicting meanings

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 72

Page 73: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Summary - RDF and OWL

RDF provides• light-weight semantics• fast queries• highly scalable implementations• large volumes of data (e.g., DBPedia, other Linked Data

repositories)

OWL provides • Constructs to formalize the intended semantics • An OWLAPI to develop, manage, and serialize OWL

ontologies• Efficient reasoners of get inferences, compute modules and

get explanations.• syntactic subset for better performance, albeit some

inferences may be lostOWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 73

Page 74: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Summary - Reasoning in OWL

• verification: reveal contradictory definitions of classes (unsatisfiable classes), conflicting conceptualizations and reveal hidden inferences (that may be considered invalid through manual verification)

• repair: through explicit definitions using disjunction, constraints can be relaxed and contradictions reduced

• more facts: OWL queries for relational patterns can be used to generate RDF triples that are closed against the constraints and axioms of an OWL knowledge base

• powerful queries: queries in OWL can be made for instances and for classes satisfying complex expressions

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 74

Page 75: A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

Conclusions

• ontologies are tools for better knowledge management• ontology (philosophy) is a useful source of well-developed

theories that can be applied to ontology design, but only when put into practice as a formalized ontology

• formal ontologies can help in getting us closer to the goal of large-scale integration, verification and analysis of data across domains and levels of granularity

• The combination of formal ontologies + scalable reasoning will be instrumental in making sense of the Semantic Web.

OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 75