View
220
Download
0
Tags:
Embed Size (px)
Citation preview
http://ontologist.com 1
The Art of Building Useful Ontologies
Barry Smith
http://ontologist.com 2
http://ontologist.com 3
http://ontologist.com 4
where in the body ? where in the cell ?
http://ontologist.com 5
where in the body ? where in the cell ?
what kind of organism ?
http://ontologist.com 6
where in the body ? where in the cell ?
what kind of organism ?
what kind of disease process ?
http://ontologist.com 7
Unified Medical Language System, Semantic Web, ontowiki, ...
let a million flowers bloom,
and rely for integration on post hoc mappings
problem: what to do with weeds ?
problem: how support reasoning across the annotated data?
how create broad-coverage semantic annotation systems for biomedicine?
http://ontologist.com 8
for science
based on prospective standardization designed to support annotation of data in ways which will be
able to support reasoning with this data
an alternative approach
http://ontologist.com 9
a family of interoperable gold standard biomedical reference ontologies built around the Gene Ontology at its core
http://obofoundry.org
The OBO Foundry
http://ontologist.com 10
A prospective standard
designed to guarantee interoperability of ontologies from the very start (and to keep out weeds)
initial set of 10 criteria tested in the annotation of
scientific literature model organism databases life science experimental results
http://ontologist.com 11Karen Eilbecksong.sf.netproperties and features of
nucleic sequencesSequence Ontology
(SO)
RNA Ontology Consortium(under development)three-dimensional RNA
structuresRNA Ontology
(RnaO)
Barry Smith, Chris Mungallobo.sf.net/relationshiprelationsRelation Ontology (RO)
Protein Ontology Consortium(under development)protein types and
modificationsProtein Ontology
(PrO)
Michael Ashburner, Suzanna Lewis, Georgios Gkoutos
obo.sourceforge.net/cgi-bin/ detail.cgi?
attribute_and_valuequalities of biomedical entities
Phenotypic Quality Ontology
(PaTO)
Gene Ontology Consortiumwww.geneontology.orgcellular components, molecular functions, biological processes
Gene Ontology (GO)
FuGO Working Groupfugo.sf.netdesign, protocol, data
instrumentation, and analysis
Functional Genomics Investigation Ontology
(FuGO)
JLV Mejino Jr.,Cornelius Rosse
fma.biostr.washington.edu
structure of the human bodyFoundational Model of
Anatomy (FMA)
Melissa Haendel, Terry Hayamizu, Cornelius Rosse,
David Sutherland, (under development)
anatomical structures in human and model organisms
Common Anatomy Refer-
ence Ontology (CARO)
Paula Dematos,Rafael Alcantara
ebi.ac.uk/chebimolecular entitiesChemical Entities of Bio-logical Interest (ChEBI)
Jonathan Bard, Michael Ashburner, Oliver Hofman
obo.sourceforge.net/cgi-bin/detail.cgi?cell
cell types from prokaryotes to mammals
Cell Ontology (CL)
CustodiansURLScopeOntology
http://ontologist.com 12
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Componen
t(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Building out from the original GO
http://ontologist.com 14
Ontologies being built to satisfy Foundry principles ab initio
Clinical Trial Ontology (CTO)Common Anatomy Reference Ontology (CARO, DB1
& DB2)Mosquito Anatomy Ontology (MAO)Ontology for Biomedical Investigations (OBI)Phenotypic Quality Ontology (PATO, DB1 & DB2)Protein Ontology (PRO)Relation Ontology (RO)RNA Ontology (RnaO)
http://ontologist.com 15
Foundry Ontologies in planning phase
Biobank/Biorepository Ontology (BrO, part of OBI)
Environment Ontology (EnvO)
Fish Multi-Species Anatomy Ontology (funding received; no acronym yet)
Infectious Disease Ontology (IDO)
Mouse Adult Neurogenesis Ontology (MANGO)
Xenopus Anatomy Ontology (XAO)
http://ontologist.com 16
CRITERIA
OPENNESS: The ontology is open and available to be used by all.
FORMAL LANGUAGE: The ontology is in, or can be instantiated in, a common formal language.
ORTHOGONALITY: The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap.
CONVERGENCE: The developers agree to work torwards a single ontology for each domain.
CRITERIA
http://obofoundry.org/http://obofoundry.org/
http://ontologist.com 17
CRITERIA
UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.
IDENTIFIERS: The ontology possesses a unique identifier space within OBO.
VERSIONING: The ontology provider has procedures for identifying distinct successive versions.
DEFINITIONS: The ontology includes textual definitions for all terms.
CRITERIA
http://obofoundry.org/http://obofoundry.org/
http://ontologist.com 18
CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content.
DOCUMENTATION: The ontology is well-documented.
USERS: The ontology has a plurality of independent users.
COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
CRITERIA
http://obofoundry.org/http://obofoundry.org/
http://ontologist.com 19
ORTHOGONALITY
annotations can be additive
ontologies do not need to create tiny theories of anatomy or chemistry within themselves
modularity ensures division of labor amongst domain experts
http://ontologist.com 20
compare: legends for mapscompare: legends for maps
http://ontologist.com 21
ontologies are legends for data
http://ontologist.com 22
natural language labels organized in a graph-theoretic structure, designed to
make the data
• cognitively accessible to human beings• algorithmically accessible to machines• linked up to other data resources because the
same labels have been used
http://ontologist.com 23
The OBO Foundry Idea
MouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
http://ontologist.com 24
The OBO Foundry Idea
MouseEcotope GlyProt
DiabetInGene
GluChem
Holliday junction helicase complex
http://ontologist.com 25
Five bangs for your GO buck
science base
cross-species database integration (human, mouse, fly ...)
cross-granularity database integration
through links to the entities in biological reality
semantic searchability links people to software
http://ontologist.com 26
Applications for which GO has already been used:
• integrating genomic and proteomic information from different organisms
• finding functional similarities in genes that are overexpressed or underexpressed in diseases and as we age
• predicting the likelihood that a particular gene is involved in diseases that haven't yet been mapped to specific genes
• verifying models of genetic, metabolic and gene product interaction networks
http://ontologist.com 27
Google: April 23, 2007
ontology 14.80 Mill.
“Gene Ontology” 0.96 Mill.
“Dublin Core” Ontology 0.65 Mill.
SUMO Ontology 0.30 Mill.
http://ontologist.com 28
The OBO Foundry Idea
MouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
http://ontologist.com 35
Reasons why GO has been successfulIt is a system for prospective standardization built
from the ground-up on the basis of what works and of real needs by domain specialists
It is built on the basis of community consensus but with considerable central leadership in imposition of best practice – authority is the only way to yield a coordinated system of interoperable ontologies
Subject to continuous update of content, documentation and formal architecture – updates every night
In such a way as to ensure backwards compatibility with prior annotations
Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution)
http://ontologist.com 36
GO has learned the lessons of successful cooperation
Clear documentation
The ontology terms chosen are already familiar
Fully open source (no secrets, thorough testing in manifold combinations with other ontologies)
Subjected to considerable third-party critique
Embraces simple rules and simple technology wherever possible, but in such a way as to create an evolutionary path to logical reasoning and integration
http://ontologist.com 38
Prospective standardization is a good thing
Prospective standardization is the only thing which will work in mission critical domains
Prospective standardization means that certain limits to tolerance must be imposed, authorities must be recognized
http://ontologist.com 39
Prospective standardization is a good thing
http://ontologist.com 40
But not every prospective standardization is a good thing
ISO 15926-2
http://ontology.buffalo.edu/bfo/west.pdfProceedings of FOIS 2006
http://ontologist.com 41
How Not to Build Useful Ontologies
ISO/FDIS 15926-2
Lifecycle integration of process plant data including oil and gas production facilities
http://ontologist.com 42
Heh ! Let’s reinvent the wheel
http://ontologist.com 43
What ISO 15926 Part 2 says
What it is ...• rigorous 4D ontology• a full ISO standard (2003)• 201 entities in upper ontology • some 50,000 entities in all• limited axiomatisation• significant industrial support
from Oil and Process industries
... good for:– integrating diverse
information systems– engineering applications– applications involving time
and space– managing change– integrating/analyzing mid-
level ontologies
http://ontologist.com 44
2006 NIST Upper Ontology Summit
March 14-15, 2006,Gaithersburg, MD
ISO 15926 proposed for general use as an upper level ontology – for ‘integrating diverse information systems’ and ‘integrating [and] analyzing mid-level ontologies’ without restriction.
Matthew West, “ISO 15926 – Integration of Lifecycle Data” http://ontolog.cim3.net /file/work/UpperOntologySummit/UO-Summit-Meeting_20050315/UOS--west_ 20060315.ppt
http://ontologist.com 45
ISO 15926
Common Objects
TimeProperties
Products and Materials
Organizations
Locations
Agreements
Intentional Thing
Buy/Sell
Manufacture
Subjec
t Areas
ISO
Process
Areas
Commo
n Interes
t
ISO 15926 as foundation
Entity Types can be referenced in one subject area from another
Accounts General Management
CarrierCRM Demand
Movement
Project/Activity
Transport Constraint
Total of 1546 entity types so far.
http://ontologist.com 46
“The purpose of ISO 15926 is to provide a Lingua Franca for computer systems, thereby integrating the information produced by them.”
http://ontologist.com 48
The importance of consensus-based uptake
An ontology is like a telephone network: it is designed to support exchange of information.
Its value depends on the number of users who agree to adopt and to help maintain this common network
Thus it depends also on the existence of a straightforward learning path for new users, and of clear and easily accessible documentation.
http://ontologist.com 49
The importance of consensus-based uptake
is even greater in the case of an upper level ontology
which is designed to support exchange of information about all subjects
http://ontologist.com 51
This is not a problem of money
Robust ontologies need to be thoroughly tested by being critically examined and pulled apart, and above all by being combined dynamically with other artifacts in real use cases à la GO
An upper ontology should not be proprietary
http://ontologist.com 52
Confusion of Data Models and Ontologies
Data Model
mass of plunger: an integer
location of plunger: a string
Ontology
mass of plunger: a quality (which can be measured)
location of plunger: a place (which can have a name)
http://ontologist.com 53
Is ISO 15926 an ontology or a data model?
I do not know
http://ontologist.com 54
Principle of intelligibility
an ontology that is advocated for general use should be understandable to its intended users. Its features should be explained in clear, simple English, extended where necessary with technical terms.
http://ontologist.com 55
First Great Mystery
Of the 201 terms included in the ISO 15926 upper-level ontology, 88 are of the form ‘class of X’, for example:
class_of_composite_materialclass_of_compoundclass_of_dimension_for_shape class_of_featureclass_of_feature_whole_partclass_of_functional_objectclass_of_inanimate_physical_objectclass_of_indirect_connection
http://ontologist.com 56
Definition of ‘class’
A <class> is a <thing> that is an understanding of the nature of things and that divides things into those which are members of the class and those which are not according to one or more criteria.
Example: ‘Centrifugal pump is a <class>’.
http://ontologist.com 57
What logic governs classes in ISO 15926?
Not, say, ZF set theory
but the theory of ‘non-well-founded sets’ devised for the special purposes of logical modeling of certain non-terminating computational processes;
allows sets to contain themselves, thereby generating infinitely descending chains of the form:
… A A A A A A A A A A
http://ontologist.com 58
The principle of simple tools
An ontology is an artifact created to support exchange of information; it is not the place to try out the latest new bits of mathematics you learned about last week
http://ontologist.com 59
But worse
ISO 15926 complicates its theory of classes by allowing classes with both actual and possible members:
‘Although there is only one <class> that has no members, there can be a <class> that has no members in the actual world, but which does have members in other possible worlds.’
No standard theory of modal logic is addressed by ISO 15926
http://ontologist.com 62
The principle of re-using available resources
if an ontology deals with what is dealt with perfectly well already in some recognized resource, then it should utilize this recognized resource.
http://ontologist.com 63
Example
There is a perfectly good theory of relations, ranges, domains, ordered pairs, and of the transitivity, symmetry, etc. of such relations, which is part of standard set theory.
What does ISO 15926-2 do here?
http://ontologist.com 64
‘class_of_ relationship_with_related_end_1’
DEFINITION: a <class_of_ relationship> where a particular <thing> is related in the <class_of_relationship>, rather than the members of a <class>. The related <thing> plays the <role_ and_domain> indicated by the class_of_end_1 attribute.
http://ontologist.com 65
The principle of terminological moderation
Stay as close as possible to the terms already used by your intended audience and to their already established meanings. Use only terms for which either (1) there is a reasonable expectation that intended users of the ontology will have a need for them, or (2) such terms are required to fill gaps in the ontology in order to create a complete hierarchy.
http://ontologist.com 66
The principle of intelligible definitions
Use definitions which are both (1) humanly intelligible (to avoid error in human use and maintenance) and (2) formally specifiable (as far as possible in such a way as to support one or other standard type of software e.g. for error-checking of the ontology).
http://ontologist.com 70
The principle of terminological coherence
For any expression ‘E’ in an ontology, ‘E’ means E.
The principle of univocityEach expression in an ontology should have the same meaning on every occasion of use.
http://ontologist.com 71
Hence:
an ontology should construct its complex terms in such a way that their constituent parts preserve their ordinary meanings. In the ISO 15926 documentation however the expression ‘individual’ is often used to mean, not: individual, but rather: possible individual. Thus:
‘class_of_individual’ = ‘a class whose members are instances of <possible_individual>’.
‘possible individual’ = ‘thing that exists in space and time’.
http://ontologist.com 72
Follow rules for formulating definitions
If ‘A’ does not mean: A, but rather: possible A, then ‘possible A’ itself means something like: possible possible A, and so on, ad exasperandum.
http://ontologist.com 76
‘Class_of_class_of_X’ terms
*class_of_class_of_compositionclass_of_class_of_definitionclass_of_class_of_descriptionclass_of_class_of_identificationclass_of_class_of_individualclass_of_class_of_information_ representationclass_of_class_of_relationshipclass_of_class_of_relationship_with_signature*,†class_of_class_of_representationclass_of_class_of_representation_translationclass_of_class_of_responsibility_for_ representationclass_of_class_of_usage_of_representation
*have no corresponding ‘class of’ term in the ontology, †contains a reference to such a term in its definition
http://ontologist.com 77
‘Class_of_class_of_X’ terms in ISO 15926
if one needs to iterate the ‘class_of’ operator, then why not do this by means of some general facility, rather than by giving names in ad hoc fashion to just those ‘class of class of’ terms one thinks one needs?
http://ontologist.com 78
<class_of_class_of_composition>
DEFINITION: a <class_of_class_of_ relationship> whose members are instances of <class_of_composition>. It indicates that a member of a member of the class_of_class_of_part is a part of a member of an instance of the class_of_class_of_whole,
EXAMPLE: Toxicity description is a class_of_class_of_part of a material data sheet, where the description “has carcinogenic components” is a class_of_part on the Mogas Material Safety Data Sheet, and copy #5 of the Mogas Material Safety Data Sheet has “has carcinogenic components” as a part.
From this we learn that: a description is a class ...
http://ontologist.com 79
Believe me, this is not the best way to deal with part-whole relations in an
ontology
http://ontologist.com 80
Pleural Cavity
Pleural Cavity
Interlobar recess
Interlobar recess
Mesothelium of Pleura
Mesothelium of Pleura
Pleura(Wall of Sac)
Pleura(Wall of Sac)
VisceralPleura
VisceralPleura
Pleural SacPleural Sac
Parietal Pleura
Parietal Pleura
Anatomical SpaceAnatomical Space
OrganCavityOrganCavity
Serous SacCavity
Serous SacCavity
AnatomicalStructure
AnatomicalStructure
OrganOrgan
Serous SacSerous Sac
MediastinalPleura
MediastinalPleura
TissueTissue
Organ PartOrgan Part
Organ Subdivision
Organ Subdivision
Organ Component
Organ Component
Organ CavitySubdivision
Organ CavitySubdivision
Serous SacCavity
Subdivision
Serous SacCavity
Subdivision
part
_of
is_a
http://ontologist.com 87
The principle of non-circularity
avoid circular definitions; and, a fortiorissimo, avoid nonsense-definitions of the forms:
‘an a is the b of an a’
(A disease is the observation of a disease [HL7])
or:
‘an a is an a which is b’
(A person is a person with an identity document [HL7])
http://ontologist.com 88
The ISO 15926 tiny theory of arithmetic
DEFINITION: A <class_of_number> is a <class_of_class> whose members are members of <arithmetic_number>
DEFINITION: An <integer_number> is an <arithmetic_number> that is an integer number.
http://ontologist.com 89
The ISO 15926 tiny theory of geometry
DEFINITION: A <class_of_dimension_for_shape> is a <class_of_class_of_ relationship> that indicates that members of the class_of_shape have a dimension that is a member of the class_of_dimension.
ELUCIDATION Specifying that members of the “class of circle” have members of “class of diameter” is an instance of <class_of_dimension_for_shape>.’
DEFINITION: a <dimension_of_shape> is a <class_of_class_of_relationship> that indicates that members of the <shape_dimension> are dimensions of the <shape> members.
EXAMPLE: The sets of 10m lines that are diameters of 10m circles is an example of <dimension_of_shape>
http://ontologist.com 90
I think this means:
Circles have diameters.
http://ontologist.com 96
The principle of non-subjective definitions
When formulating definitions avoid the use of phrases like ‘which may ...’, ‘that indicates …’, ‘… characterize …’, ‘an aspect of …’, ‘may have ...’, which invite subjective interpretation.
DEFINITION: A <feature_whole_part> is an <arrangement_of_individual> that indicates that the part is a non-separable, contiguous part of the whole.
DEFINITION: A <class_of_relationship_with_signature> is a <class_of_relationship> that may have a <role_and_domain> specified for each end’.
http://ontologist.com 97
The ISO 15926 tiny theory of physics
DEFINITION: A <class_of_sub_atomic_particle> is a <class_of_arranged_ individual> whose members are constituent particles of atoms.
EXAMPLE: Proton, electron, meson, neutron, positron, muon, quark, and neutrino can be represented by instances of <class_of_sub_atomic_particle>
DEFINITION: An <arranged_individual> is a <possible_individual> that has parts that play distinct roles with respect to the whole. The qualities of an <arranged_individual> are distinct from the qualities of its parts.
What are the parts of a neutrino? What distinct roles do they play?
http://ontologist.com 98
The principle of non-redundant definitions
do not include clauses in definitions which contribute nothing to the application of the definition.
DEFINITION: An <event> is a <possible_individual> with zero extent in time. An <event> is the temporal boundary of one or more <possible_individual>s, although there may be no knowledge of these <possible_individual>s.
http://ontologist.com 102
Problems with ISO Standardization
Adoption by ISO does not guarantee that an artifact satisfies all the requirements which might reasonably be placed on an international standard.*
*See “Wüsteria”, Stud Health Technol Inform, 2005;116:647–652.
http://ontologist.com 103
ISO standardization may bring costs
– harder to correct errors*– often involves less than ideal compromises (since
adoption by ISO requires compatibility with prior ISO standards)
*Some good news on this front re Lifecycle Integration Schema:
http://www.tc184-sc4.org/wg3ndocs/wg3n1328/lifecycle_integration_schema.html(not corrected since 2003)
http://ontologist.com 104
More good news: Norwegian Truth
http://projects.dnv.com/reference_data/RD7Browser/
RD 7 Browser Version 2007/1
http://ontologist.com 105
class_of_arranged_individual
DNV: A class of individuals that are arranged from parts
Lifecycle Integration Schema: A <class_of_arranged_individual> is a <class_of_individual> whose members are an arrangement of components.
http://ontologist.com 106
but also problems:
DNV: Class of Individual =Def. A Class of Individual that is the part (component) in an assembly
DNV: Individual =Def. A Role used in ST 3451 and 3601 to designate the object that has the given status or material type, respectively
http://ontologist.com 107
Perhaps I am entirely mistaken
Perhaps ISO 15926-2 is an excellent artifact of its kind – a ‘data model’; I do not know
But it is not an excellent ontology
http://ontologist.com 108
What the oil and gas industry tells me
The ‘upper ontology’ of ISO 15926 is hardly used
The lower branches of ISO 15926 are useful as a controlled vocabulary (because what came before was so bad) – these lower levels deal
with pump, flange, valve,
http://ontologist.com 109
What the oil and gas industry needs
rules governing prospective standards for controlled vocabularies for annotations of data which will support not only search but also interoperability of and reasoning with this data
‘interoperability’ here means not only between machines but also between people who need to build and use an ontology
National Center for Ontological Research (http://ncor.us) in conjunction with various organizations, including the GO consortium and NIST, is attempting to formulate, test and disseminate such rules