Upload
michel-dumontier
View
1.800
Download
0
Embed Size (px)
DESCRIPTION
This tutorial will provide detailed instruction to create and make use of formalized ontologies from linked open data for advanced knowledge discovery including consistency checking and answering sophisticated questions.Automated reasoning in OWL offers the tantalizing possibility to undertake advanced knowledge discovery including verifying the consistency of conceptual schemata in information systems, verifying data integrity and answering expressive queries over the conceptual schema and the data. Given that a large amount of structured knowledge is now available as linked data, the challenge is to formalize this knowledge iso that intended semantics become explicit and that the reasoning is efficient and scalable. While using the full expressiveness of OWL 2 yields ontologies that can be used for consistency verification, classification and query answering, use of less expressive OWL profiles enable efficient reasoning and support different application scenarios. In this tutorial, - we describe how to generate OWL ontologies from linked data- check consistency of knowledge- automatically transform ontologies into OWL profiles- use this knowledge in applications to integrate data and answer sophisticated questions across domains.- expressive ontologies enables data integration, verifying consistency of knowledge and answering questions- formalization of linked data will create new opportunities for knowledge discovery- OWL 2 profiles support more efficient reasoning and query answering procedures- recent technology facilitates the automatic conversion of OWL 2 ontologies into profiles- OWL ontologies can dramatically extend the functionality of semantically-enabled web sites
Citation preview
A little more semantics goes a lot further!Getting more out of Linked Data with OWL
Dr. Michel DumontierDr. Robert Hoehndorf
AbstractThis tutorial will provide detailed instruction to create and make use of formalized ontologies from linked open data for advanced knowledge discovery including consistency checking and answering sophisticated questions.
Automated reasoning in OWL offers the tantalizing possibility to undertake advanced knowledge discovery including verifying the consistency of conceptual schemata in information systems, verifying data integrity and answering expressive queries over the conceptual schema and the data. Given that a large amount of structured knowledge is now available as linked data, the challenge is to formalize this knowledge iso that intended semantics become explicit and that the reasoning is efficient and scalable. While using the full expressiveness of OWL 2 yields ontologies that can be used for consistency verification, classification and query answering, use of less expressive OWL profiles enable efficient reasoning and support different application scenarios. In this tutorial,- we describe how to generate OWL ontologies from linked data- check consistency of knowledge- automatically transform ontologies into OWL profiles- use this knowledge in applications to integrate data and answer sophisticated questions across domains.- expressive ontologies enables data integration, verifying consistency of knowledge and answering questions- formalization of linked data will create new opportunities for knowledge discovery- OWL 2 profiles support more efficient reasoning and query answering procedures- recent technology facilitates the automatic conversion of OWL 2 ontologies into profiles- OWL ontologies can dramatically extend the functionality of semantically-enabled web sites
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 2
skills obtained• understand the nature and capability of a formal ontology and information system• understand the subtle differences between OWL2 and its profiles, including difference in
constructs, when to apply these profiles and how to convert ontologies in this format• understand the distinction between a class and an individual and their descriptions• understand how to convert RDF triples in Linked Data into axioms for an OWL ontology• understand how to execute standard reasoning services (classification, consistency
checking, realization, query answering) on an OWL ontology using the OWL API and an OWL reasoner, with focus on OWL-EL ontologies and reasoners.
• understand how to identify inconsistencies and simple patterns to remove or repair them• Understand how to convert large amounts of linked data into a large scale OWL
knowledge base and enable tractable reasoning over it
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 3
90 min Outline1. introduction (10min)• case study: SGD• linked data vs ontology• RDF vs OWL• Motivation: can we use some features of OWL to organize, verify and exploit Linked Data?
2. Formalization• OWL2 – elements, expressions and axioms• Triples to axioms• Role of top level ontologies (classes + relations)• Axiom patterns
3. Practical Reasoning• classification using CEL/CB/Pellet/HermiT/...• OWL profiles• Modularization (EL Vira)• Diagnosis and Repair• Explanations• Inference of new triples
4. Conclusion
4OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
Saccharomyces Genome Database
A repository for all things yeast.
includes :• molecular entities, their parts
o chromosomes; genes, open reading frames, etco rna, proteins; domains
• qualities, realizables (dispositions, functions)• interactions and their participants• complexes, their parts and their topology• pathways and their components• phenotypes and their basis
5OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
Hexokinase (HXK1)
The HXK1 gene encodes the HXK1 protein - which is responsible for the conversion of glucose to glucose-6-phosphate in the first step of glycolysis.
Gene:(region of DNA)
Protein(macromolecule)
6OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
Questions we may want to ask about HXK1:
• What kind of thing is HXK1?• What are the implications of being a gene?
o In which chromosome does it appear?o Which entities does it encode?
• What are the implications of being a protein?o What is its function?o Where is it located in the cell?o If HXK1 participates in processes that involves other
cellular components, where else must HXK1 be located?• Is HXK1 annotation consistent?
o does the annotation contradict common biological knowledge?
o Is it possible for HXK1 to have multiple locations when it can only be located on one chromosome?
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 7
SGD refers to other data sources
Gene Ontology- functions, locations, processes
Ascomycetes Phenotype Ontology- experiments, interactions and phenotypes
Pubmed- abstracts of published research articles + MeSH terms
over 40 references to other molecular/data entities for which the relation is unclear…
8OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
Bio2RDF’s RDFized data fits together
9syntactic integration OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
SGD as RDF-based Linked Open Data
10OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
SGD is provided by Bio2RDF and forms part of the growing linked open data cloud
11OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
Semantic Integration
• Requires a level of abstraction/generalization where the relationship between each resource is formalized– classes– relations
• How do we ensure that our representation facilitates integration across datasets?
• How can we get our formalization to interoperate with ontologies?
12OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
Early conceptualization
13OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
More advanced conceptualization
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 14
Semantic Technologies: RDF vs OWL
RDF: simple triples, graph-based queries, supports very large amount of data
OWL: significantly more expressive language, strong axioms, inference capabilities, consistency verification, but can be rather slow
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 15
RDF-based Linked Data
• Provides the basis for simple data syndication and syntactic data integrationo IRIso Statements (aka triples) take the form ofo <subject> <predicate> <object>
• Easy to implemento stand-alone datasetso logical layer over databases
• Limited reasoningo class and property hierarchieso domain/range restrictionso can’t automatically discover inconsistency
• Standardized Queries - SPARQL• Scalable - to billions of triples
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 16
OWL - The Web Ontology Language
• Enhanced vocabulary (strong axioms) to express knowledge relating to classes, properties, individuals and data valueso quantifiers (existential, universal, cardinality restriction)o negationo disjunctiono property characteristicso complex classes in domain and range restrictions o property chains
• Advanced reasoning
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 17
Advanced Reasoning
• Consistency: determines whether the ontology contains contradictions.
• Satisfiability: determines whether classes can have instances.
• Subsumption: is class C1 implicitly a subclass of C2?
• Classification: repetitive application of subsumption to discover implicit subclass links between named classes
• Realization: find the most specific class that an individual belongs to.
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 18
OWL Challenges and Solutions
Inconsistency:• needs to be resolved to ask any questions involving the
ontology• Solution: explicitly accommodate multiple meanings,
remove contradictory axioms
Unsatisfiability (of a class):• may indicate a modelling error• needs to be resolved to ask meaningful questions about the
class• Solution: explicitly accommodate multiple meanings,
redefine class, remove contradicting class restrictions
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 19
OWL Challenges and Solutions
Scalability:• answers to OWL queries requires reasoning• inference in OWL is highly complex (worst case: 2
NEXPTIME)• highly optimized reasoners are getting better and better, but
can still be slow with large ontologies• tractable OWL profiles (EL, QL, RL) enable more efficient
and guaranteed polynomial-time inferences• use ontology modularization approaches to increase
performance
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 20
Linked data and OWL: Motivation
• use OWL reasoning to identify mistakes in RDF datao incorrect content of assertionso incorrect use of relationso conflicting conceptualizationso incorrect same-as assertions
• verify, fix and exploit Linked Data through expressive OWL reasoning
• generate/infer new triples to write back into RDF and use for efficient retrieval
Proposal:Convert RDF to OWL to perform inferences and represent inferences in RDF after classification.
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 21
OWL can help you create rich, machine-understandable descriptions!• transform our expert knowledge into axioms and
expressions that can be automatically reasoned abouto a transcription factor isa protein that binds to DNAand regulates the expression of a gene.
o can we mine 'omic datasets to discover which proteins are transcription factors?
• create rich expressions from combinations of classes, relations and individuals
• assert statements of truth using axioms.
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 22
Elements of OWL 2.0• The “ontology” of OWL 2 consists of:
• Classes• Object properties• Data properties• Individuals• Expressions• Axioms• Plus RDF stuff (like datatypes)
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 23
Classes and class axioms
• a class is a set of individuals that share one or more characteristicso a protein
• classes can be organized in a hierarchy using subClassOf axiomso i.e. every member of C2 is a member of C1o subClassOf (protein molecule)
• special classeso owl:Thing is the superclass of all thingso owl:Nothing is the subclass of all things, denotes an empty set
• classes can be made disjoint from one anothero i.e. there is no member of C1 that is also a member of C2o disjointClasses (protein DNA )
• classes can be said to be equivalento i.e. all members of C1 are members of C2 and all members of C2
are members of C1o EquivalentClass (Peptide Polypeptide )
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 24
Object Properties and axioms
• an object property OP is a relation between two individualso 'has part' is an object property that denotes the mereological
relation between two individuals• OPs can be organized in a hierarchy
o given OP1 and OP2 and OP2 is a subproperty of OP1 then if an individual x is connected by OP2 to an individual y, then x is also connected by OP2 to y.
o subPropertyOf ('has proper part' 'has part')o owl:TopObjectProperty, owl:BottomObjectProperty
• We can restrict the domain and range to allowed values• ObjectPropertyDomain ('is participant in', 'process')• ObjectPropertyRange ('is participant in', 'physical entity')• We can also assert objects to be disjoint or equivalent
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 25
description of object properties• Inverse
o we say that 'has part' is an inverse for 'is part of'o we can also refer to this as inv('is part of')
• Symmetrico to cases where the inverse relation is the very same relationo e.g. the inverse for 'is related to' is 'is related to‘
• Transitiveo a transitive relation if individual x is connected to an individual y
that is connected by to an individual z, then x is also connected by to z
o e.g. 'has part' is transitive
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 26
description of object properties• Reflexive
o reflexive infers that the relation automatically refers back to the individual
o e.g. 'has part' is reflexive because protein has itself as a part.
• Functionalo restrict the range of the relation to a single individual, and therefore
all individuals in the range must be the same.o e.g. 'has unique identifier‘
• Inverse Functionalo restrict the domain of the relation to a single individual, therefore all
individuals in the domain must be the sameo e.g. 'is unique identifier of'
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 27
Class ExpressionsClass expressions are rich descriptions of classes through the logical combination of ontological primitives (classes, object properties, datatype properties, individuals)
Protein subClassOfmolecule and ‘has proper part’ min 2 ‘amino acid residues’
Combinations specified using logical operators• conjunction (and), disjunction (or), negation (not)
Object or data property expressions provide a qualified cardinality over the relation
o minimum: rel min # Y o maximum: rel max # Y
o exact: rel exactly # Y (minimum + maximum)o some: rel min 1 Y
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 28
Class Expressionso The quantifications can qualified by the object type
o rel only Y – the only values allowed are of type Y
• To form complex class expressions likeo 'molecule' and not 'dna'o 'has part' min 2 'amino acid'o 'is located in' only ('nucleus' or 'cytoplasm')
• and be expressed as axioms in the ontologyProtein subClassOf
molecule and ‘has proper part’ min 2 ‘amino acid residues’
Transcription Factor equivalentClass‘protein’and ‘has disposition’ some ‘to bind to DNA’and ‘has function’ some ‘to regulate gene expression’
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 29
Triples to axioms
Convert RDF triples into OWL axioms.
Triple in RDF:<Nucleus> <partOf> <Cell>
• Nucleus and Cell are classes• partOf is a relation between 2 classes• intended meaning:
every instance of Nucleus is partOf some instance of Cell
• formalize as OWL axiom:
Nucleus subClassOfpartOf some Cell
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 30
Triples to axioms
Triple in RDF:<Cytosol> <isLocationOf> <HXK1>
• Cell and HXK1 are classes• isLocationOf is an axiom pattern involving 2 classes• intended meaning:
• every instance of HXK1 is located at some instance of Cytosol• not intended:
• for every instance of Cytosol, there is an instance of HXK1 located in it.
HXK1 subClassOfhasLocation some Cytosolinv(isLocationOf) some Cytosol
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 31
Triples to axioms
Convert RDF triples into OWL axioms.
Triple in RDF:<C1 R C2>• C1 and C2 are classes, R a relation between 2 classes• intended meaning:
o C1 SubClassOf: C2o C1 SubClassOf: R some C2o C1 SubClassOf: R only C2o C2 SubClassOf: R some C1o C1 SubClassOf: S some C2o C1 DisjointFrom C2 o C1 and C2 SubClassOf: owl:Nothingo R some C1 DisjointFrom: R some C2o C1 EquivalentClasses C2o ...
• in general: P(C1, C2), where P is an OWL axiom (template)OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 32
Challenge:Formalizing data requires
one to commit to a particular meaning – to
make an ontological commitment
Triples to axioms
Formalizing RDF triples in OWL often introduces new OWL object properties.
• Which object properties should be included?• What axioms hold for included object properties?• Can domain and range restrictions be generalized across
multiple domains, i.e., reused across multiple linked data sources to ensure consistency between them?
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 33
Challenges
Top level ontologies contain generalized (domain independent)
classes and relations
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 34
They can be used to constrain what can be said about these entities (and hence will later be useful for checking the consistency of data annotated using these terms).
Basic classes in top-level ontologies
• Material entity• Example: Apple, Human, Cell, Planet• Has mass as an quality• Located in space and time• Independent of other entities• it exists in whole whenever it exists
• Quality• Example: mass, color, concentration• Dependent: always the quality of some entity• Quality of object: size, shape, length• Quality of process: duration, rate• Quality of quality: shade (of color), intensity
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 35
Basic classes in top-level ontologies
• Function• e.g. to bind, to catalyze (a reaction), to kill bacteria• Dependent: always the function of some thing• Similar to a property of an object• Represents the potential to do something (an action) in
some process• capabilities, dispositions and tendencies
• Process• Example: running a marathon, binding, cell division• Located in space and time• Independent of other entities• Temporally extended
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 36
Top-level ontologies make a commitment to these being different
things
Material object, Process, Function and Quality are mutually disjoint.
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 37
Basic Relations in Top Level Ontologies
• Mereological: parthood– ‘has part’, ‘has proper part’, ‘has component part’
• Participatory– ‘is participant in’, ‘is agent in’, ‘is target in
• Topology– ‘is connected to’, ‘located in’, ‘contains’, ‘is adjacent to’
• Temporal– ‘derives from’, ‘precedes’, ‘meets’, ‘overlaps’, etc
• Referential– ‘describes’, ’references’, ‘represents’
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 38
Relations in top-level ontologies
• relations (object properties) in OWL hold between instances
• domain and range restrictions from top-level ontology can be applied for general relations, e.g.:o ‘has part’ can be restricted with "Material object" as
both domain and rangeo ‘participates in’ can be restricted with a domain of
"Material object" and a range of "Process“o re-use of relations enables inferences across
resources
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 39
Enforce ontological commitment by mapping to a top-level ontologyFoundation of domain classes and relations in top-level ontology:• every domain class becomes a subclass of a class in top-
level ontology• every object property used in OWL axioms becomes a sub-
property of an object property in the top-level ontology• assert additional axioms to restrict domain classes and
delimit it from other domains (where appropriate)o e.g., if a particular resources uses (in RDF) the relation
part-of exclusively between processes, the additional constraint can be added
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 40
Top-level ontology
Application of a top-level ontology:
• can help to make the ontological commitment that is employed within an information system explicit,
• can guarantee basic agreement about fundamental types,• agreement about common relations, • provides common domain and range restrictions across
multiple domains, and therefore• enables re-use of relations and types across data sources,
domains, levels of granularities, information systems.
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 41
Formalization of SGD’s Linked Data
SGD uses at least the following relations in RDF:
• isPartOf• hasParticipant• isFunctionOf• isLocationOf
Can we create patterns from which linked data can be appropriately formalized into OWL axioms?
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 42
axiom patterns
Formalization of SGD Linked Data
?X isPartOf ?Y
Can be translated to axiom pattern
?X subClassOf: part-of some ?Y
"part-of" is an object property contained in our top-level ontology.
Example:HXK1 isPartOf chromosome6_Cricktranslated toHXK1 subClassOf: part-of some chromosome6_Crick
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 43
Formalization of SGD Linked Data
?X hasParticipant ?Y
translated to axiom pattern
?Y subClassOf: participates-in some ?X
"participates-in" is an object property contained in our top-level ontology.
Example:GO:0005975 (carbohydrate metabolism) hasParticipant HXK1translated toHXK1 subClassOf: participates-in some GO:0005975
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 44
Formalization of SGD Linked Data
?X isLocationOf ?Y
translated to axiom schema
?Y subClassOf: located-in some ?X
Example:GO:0005737 (cytoplasm) isLocationof HXK1translated toHXK1 subClassOf: located-in some GO:0005737
What if "located-in" is not present in our top-level ontology…
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 45
Formalization of SGD Linked Data
Top-level foundation for located-in relation:
• declare located-in as sub-property of part-ofo verify how located-in is used within SGD, i.e., does
located-in imply part-of? o counter-example: misfolded protein located-in chaperone
protein, but not misfolded protein part-of chaperone protein
• create located-in as super-property of part-of in our top-level ontology:o does part-of imply located-in within SGD?o cell body part-of cell, but not cell body located-in cell
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 46
Formalization of SGD Linked Data
Top-level foundation for located-in relation:
• add located-in to our top-level ontologyo adding the new relation allows its reuse across multiple
resourceso inclusion may require addition of further classes (e.g.,
spatial regions)o relation to part-of must be clarified (and part-of may even
be replaced by located-in)
Establishing the relation between relations and classes depends on how the relations and classes are being applied.
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 47
Formalization of SGD Linked Data
Top-level foundation:
TranslateHXK1 rdf:type OpenReadingFrametoHXK1 subClassOf: OpenReadingFrame
OpenReadingFrame (Sequence Ontology) is a subclass of Sequence.
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 48
49OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
Formalization of SGD Linked Data
Foundation for SGD classes in top-level ontology: • declare Sequence to be a subclass of Material object• import (owl:imports) Sequence Ontology• declare Biological Process (GO) subclass of Process • declare Molecular Function (GO) subclass of Function• import GO• ...
to create a top-level foundation (i.e., super-class in top-level ontology for all classes) for SGD
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 50
Implementation
• expand relations in RDF based on relational patterns• relational patterns are OWL axioms with 2 variables (which
are filled by subject and object, respectively)• implementation based on OWL API• adopt implementation of relational patterns in OBO
language (http://code.google.com/p/obo2owl/)
Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Kelso, Janet, Herre, Heinrich, and Rebholz-Schuhmann, Dietrich (2010). Relational patterns in OWL and their application to OBO. OWL: Experiences and Directions (OWLED). paper: http://www.webont.org/owled/2010/papers/owled2010_submission_3.pdfpresentation: http://www.slideshare.net/micheldumontier/relational-patterns-in-owl-and-their-application-to-obo
BMC Bioinformatics: http://www.biomedcentral.com/1471-2105/11/441
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 51
Another way?
• OPPL is an abstract formalism that allows for manipulating ontologies written in OWL.
• Use OPPL to select triples and create the axioms
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 52
Operations on OWL ontologies
• Consistency: determines whether the ontology contains contradictions.
• Satisfiability: determines whether classes can have instances.
• Subsumption: is class C1 implicitly a subclass of C2?• Classification: repetitive application of subsumption to
discover implicit subclass links between named classes• Realization: find the most specific class that an individual
belongs to.
OWL reasoners can perform these operations and make the results accessible for further processing.
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 53
Practical reasoning with OWL ontologies• Ontology editors such as Protege interface with reasoners to
perform consistency and class satisfiability, classification, realisation, and provide explanations.
• Some reasoners are setup to be used as the command line to execute requests including SPARQL querying.
• Programmatic use of reasoners via APIs. Maximal flexibility, e.g., one can request all subclasses of a given class, including implicit once, or all entailed statements with a specified subject and predicate
54OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
55OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
OWLAPI
Classifying the ontology
56OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
Classifying the ontology
57OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
OWL Reasoners
OWL DL Reasoners• Pellet: Clark & Parsia, dual-licensed, Java.• Fact++: Manchester University, open-source, C++ with a Java API.• HermiT: Oxford University, open-source, Java.• Racer Pro: Racer Systems, commercial, Lisp with a Java API.
OWL Profile/subset reasoners• Jena: Hewlett-Packard, open-source, Java.• OWLIM: Ontotext, dual-licensed, Java.• CB:• CEL:• JCEL (Pellet)• ELLY:
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 58
Automated reasoning over SGD
• SGD in OWL contains more than 800,000 axioms• included ontologies contains several thousand axioms
o GO has approx. 35,000 classeso ChEBI contains almost 100,000 classes o complex definitions of classes create links between large
ontologies• Reasoning in OWL 2 DL is highly complex (worst-case
2NEXPTIME complete).• Consequence: OWL reasoning can rarely be employing in a
large scale.• Expressive OWL reasoners do not classify the formalized
SGD repository.
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 59
OWL Profiles
• OWL 2 defines three different tractable profiles:• EL
o polynomial time reasoning for schema and datao Useful for ontologies with large conceptual part
• QLo fast (logspace) query answering using RDBMs via SQLo Useful for large datasets already stored in RDBs
• RLo fast (polynomial) query answering using rule-extended
DBso Useful for large datasets stored as RDF triple
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 60
OWL RL
Features:• identity of classes, instances, properties• subproperties, subclasses, domains, ranges• union and intersection of classes (some restrictions)• property characterizations (functional, symmetric, etc)• property chains• keys• some property restrictions (but not all inferences are
possible)Limitations:• not all datatypes are available• no datatype restrictions• no minimum or exact cardinality restrictions• maximum cardinality only with 0 and 1• some consequences cannot be drawn
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 61
OWL EL
Features• existential quantification to a class expression or data range• existential quantification to an individual or a literal• self-restriction• enumerations involving a single individual or a single literal • intersection of classes and data range• class axioms: subClassOf, equivalence, disjointness• property axioms: domain, range, equivalence, transitive, reflexive, inclusion with
or without property chains; functional data properties. keys.• assertions (sameAs, DifferentFrom, Class, Object Property, Data Property,
Negative Object/Data PropertyNot supported• universal quantification to a class expression or a data range• cardinality restrictions• disjunction (union)• class negation• enumerations involving more than one individual• object properties: disjoint, symmetric, asymmetric, irreflexive, inverse, functional
and inverse-functional
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 62
Ontology modularization
Can we automatically extract a large (maximal) OWL (EL, QL, RL) module from an ontology?
1. D EquivalentTo: not A (not EL)2. C EquivalentTo: not B (not EL)3. B subClassOf: A (EL)
Inference: • D subClassOf: C (EL) (Inference from (1)-(3))
EL module of (1)-(3):• {B subClassOf: A}, or• {B subClassOf: A, D subClassOf: C}
63OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
EL Vira modularization
• ontology modularization• identify EL, QL, RL axioms in deductive closure• retain signature of ontology• maximality is an open problem
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 64
http://el-vira.googlecode.com
Consistency repair
• Unsatisfiable classes result from contradictory class definitions
• Conflict in asserted axioms, in imported ontologies or through combination of both
• Conflicts can be hidden through domain/range restrictions, subclass relations, axioms for relations, etc.
• Conflicting axioms may be challenging to identify!
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 65
Protege 4: Explanations
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 66
Consistency repair
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 67
Ontology repair and disambiguation
• Ontological constraints may have been too strong• Complex relations (between classes) that are used in
multiple meanings can be relaxed by explicitly introducing a disjunction that accommodates the different meanings, e.g.:
o (1) Hxk1 part-of Chromosome6_Crick_strando (2) Hxk1 part-of Hxk1_ATP_complexo (3) Hxk1 part-of Carbohydrate_metabolismo only (1) is consistent with background knowledge that
Genes (as material objects) must be part of material objects (more specifically DNA), and that Genes cannot be part of protein complexes
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 68
Ontology repair and disambiguation
1. Hxk1 part-of Chromosome6_Crick_strand2. Hxk1 part-of Hxk1_ATP_complex3. Hxk1 part-of Carbohydrate_metabolism
part-of here means either?X subClassOf: part-of some ?Y, or?X subClassOf: encodes some (part-of some ?Y), or?X subClassOf: participates-in some ?Y, or?X subClassOf: encodes some (participates-in some ?Y)
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 69
Ontology repair and disambiguation
?X subClassOf: part-of some ?Y, or?X subClassOf: encodes some (part-of some ?Y), or?X subClassOf: participates-in some ?Y, or?X subClassOf: encodes some (participates-in some ?Y)
All four interpretations are disjoint! Create new interpretation for part-of:
?X subClassOf:part-of some ?Y orencodes some (part-of some ?Y) orparticipates-in some ?Y orencodes some (participates-in some ?Y)
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 70
Inference of revised RDF representation• Query OWL ontology for relational patterns that were used
in relation expansion• generates deductive closure of a set of RDF triples with
respect to inferences in OWL• naive implementation:
o given a pattern P(?X, ?Y), substitute all combination of named classes for ?X and ?Y
o runtime: n*no more efficient implementation work-in-progress
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 71
Inference of revised RDF representationIn the definition: ?X subClassOf:
part-of some ?Y orencodes some (part-of some ?Y) orparticipates-in some ?Y orencodes some (participates-in some ?Y)
one or more of the classes in the disjunction may become unsatisfiable! • reasoner can be used to decide which interpretation is
correct• eliminate remaining interpretations• useful to "split" relations in RDF that have multiple
conflicting meanings
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 72
Summary - RDF and OWL
RDF provides• light-weight semantics• fast queries• highly scalable implementations• large volumes of data (e.g., DBPedia, other Linked Data
repositories)
OWL provides • Constructs to formalize the intended semantics • An OWLAPI to develop, manage, and serialize OWL
ontologies• Efficient reasoners of get inferences, compute modules and
get explanations.• syntactic subset for better performance, albeit some
inferences may be lostOWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 73
Summary - Reasoning in OWL
• verification: reveal contradictory definitions of classes (unsatisfiable classes), conflicting conceptualizations and reveal hidden inferences (that may be considered invalid through manual verification)
• repair: through explicit definitions using disjunction, constraints can be relaxed and contradictions reduced
• more facts: OWL queries for relational patterns can be used to generate RDF triples that are closed against the constraints and axioms of an OWL knowledge base
• powerful queries: queries in OWL can be made for instances and for classes satisfying complex expressions
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 74
Conclusions
• ontologies are tools for better knowledge management• ontology (philosophy) is a useful source of well-developed
theories that can be applied to ontology design, but only when put into practice as a formalized ontology
• formal ontologies can help in getting us closer to the goal of large-scale integration, verification and analysis of data across domains and levels of granularity
• The combination of formal ontologies + scalable reasoning will be instrumental in making sense of the Semantic Web.
OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 75