Upload
ethelbert-cook
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
1O p en G A L E N
Ontologies, Clinical and Genomic InformationOntologies, Clinical and Genomic InformationHow to say what we mean and mean what we sayHow to say what we mean and mean what we say
Opportunities & PitfallsOpportunities & Pitfalls
Alan Rector, Jeremy Rogers, Chris WroeAlan Rector, Jeremy Rogers, Chris Wroe
Information Management Group / Bio Health Informatics GroupInformation Management Group / Bio Health Informatics GroupDepartment of Computer Science, University of ManchesterDepartment of Computer Science, University of Manchester
[email protected] [email protected] www.clinical-escience.orgwww.clinical-escience.org
www.co-ode.orgwww.co-ode.orgwww.opengalen.orgwww.opengalen.org
protege.stanford.orgprotege.stanford.org
2O p en G A L E N
What Is An “Ontology”?What Is An “Ontology”?
• Ontology (Socrates & Aristotle 400-360 BC)
– The study of being
• Word borrowed by computing for the explicit description of the conceptualisation of a domain:
– concepts (“entities”)
– properties and attributes of concepts
– constraints on properties and attributes
– Individuals (often, but not always)
• An ontology defines – a common vocabulary
– a shared understanding
– a classification
3O p en G A L E N
Sharing info Sharing info Sharing meaning Sharing meaning
Metadata• Data describing the content and
meaning of resources and services.
• But everyone must speak the same language…
Terminologies• Shared and common
vocabularies
• For search engines, agents, curators, authors and users
• But everyone must mean the same thing…
Service providerService provider
Service providerService providerService
providerService provider
Service providerService provider
Service providerService provider
Ontologies Shared and common
understanding of a domain Essential for search, exchange
and discovery
4O p en G A L E N
Measure the world…Measure the world…quantitative modelsquantitative models(not ontologies)(not ontologies)
• Quantitative
– Numerical data: • 2mm, 2.4V, between 4 and 5 feet
– Unambiguous tokens
– Main problem is accuracy at initial capture
– Numerical analysis (e.g. statistics) well understood
• Examples:
– How big is this breast lump?
– What is the average age of patients with cancer ?
– How much time elapsed between original referral and first appointment at the hospital ?
5O p en G A L E N
describe the the world – describe the the world – ontologiesontologies
• Qualitative
– Descriptive data• Cold, colder, blueish, not pink, drunk
– Ambiguous tokens• What’s wrong with being drunk ?
– Ask a glass of water.
– Accuracy poorly defined
• More examples
– How pleomorphic are the cells in the biopsy?
– What is a protein’s function?
– What is the derivation of a tissue?
6O p en G A L E N
Why Develop an Ontology?Why Develop an Ontology?Naming, Classifying, IndexingNaming, Classifying, Indexing
• To share common understanding of the structure of descriptive information – among people– among software agents– between people and software
• To enable reuse of domain knowledge
– to introduce standards to allow interoperability
• To index and annotate other resources
Semantic InteroperabilitySemantic InteroperabilityFoundation of the Semantic Web/GridFoundation of the Semantic Web/Grid
7O p en G A L E N
More ReasonsMore Reasons
• To make domain assumptions explicit– easier to change domain assumptions (consider a
genetics knowledge base)
– easier to understand and update legacy data
• To separate domain knowledge from the operational knowledge– re-use domain and operational knowledge separately
(e.g., configuration based on constraints)
• To manage the combinatorial explosion
8O p en G A L E N
A semantic continuumA semantic continuum
[Mike Uschold, Boeing Corp]
Shared human consensus
Implicit
Text descriptions
Pump: “a device for moving a gas or liquid from one place or container to another”
Informal(explicit)
Semantics hardwired; used at runtime
Formal(for humans)
Semantics processed and used at runtime
(pump has (superclasses (…))
Formal(for machines)
• Less ambiguity• Better inter-operation• More robust – less
hardwiring• More difficult
Further to the right
9O p en G A L E N
An Ontology should be just the An Ontology should be just the BeginningBeginning
OntologiesOntologies
Software agents
Software agents Problem-
solving methods
Problem-solving
methods Domain-independent applications
Domain-independent applications
DatabasesDatabasesDeclarestructure
Knowledgebases
Knowledgebases
Providedomain
description
The “Semantic
Web”
The “Semantic
Web”
10O p en G A L E N
What an Ontology Isn’tWhat an Ontology Isn’t(“It won’t make the coffee”)(“It won’t make the coffee”)
• A database– Ontologies are about categories/classes/types/concepts/entities
not instances• ABOUT diseases, genes, proteins, ...
NOT ABOUT specific patients, samples, studies, …
• A database/EHR schema– An ontology is about meaning rather than storage
• Although ontology technologies are a means for merging schemas
• A decision support/protocol management system– The entities used in the rules, not the rules
• A metadata schema– The entities used in the metadata, not the schema itself
• A lexicon– Meaning rather than language
• But every ontology needs language tools
11O p en G A L E N
Ontology TechnologiesOntology Technologies
• Description logics (DLs), OWL– Designed to provide logical support for automatic classification
and consistency checking • Designed for sharing and software engineering• Leverage off Semantic Web / Grid commnity
– But not everything in OWL is an ontology
• RDF(S)
• Specialised for groups– DAGEdit and other OBO tools; FMA explorer, …
• UML– Carefully developed UML models convey much information for
an ontology• But support only very simple inference and checking
12O p en G A L E N
Why it’s hard (1)Why it’s hard (1)• Language is slippery & local; Rigour & logic are hard
– Classification is too easy for people (to do badly)• But logical/computational properties unintuitive
– Combinatorial explosions
– Philosophical & “religious” differences
• Information capture – Data quality
– Tools & environments
• Different points of view– Oncology, Cardiology, …
– Adult, developmental, aetiological,…
– Clinical, genetic, genomic,
13O p en G A L E N
Why it’s hard (2)Why it’s hard (2)
• Need a combined model of meaning
– The EHR/Database holding the ontology PLUSThe ontology held
• Hard to scope – easy to do too much
– “Just in time” ontology• Better in the bio than the medical community
• Software engineering methods poorly understood
14O p en G A L E N
Classification is easy for people (to do badly)Classification is easy for people (to do badly)“On those remote pages it is written that animals are
divided into:
a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigse. mermaids f. fabulous ones g. stray dogs h. those that are included in this classificationi. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hair brush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance"
From The Celestial Emporium of Benevolent Knowledge, Borges
15O p en G A L E N
Avoiding combinatorial explosionsAvoiding combinatorial explosions
• The “Exploding Bicycle” From “phrase book” to “dictionary + grammar”
– 1980 - ICD-9 (E826) 8 – 1990 - READ-2 (T30..) 81– 1995 - READ-3 87– 1996 - ICD-10 (V10-19 Australian) 587
• V31.22 Injury or accident to the occupant of three-wheeled motor vehicle in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income
– and meanwhile elsewhere in ICD-10• W65.40 Drowning and submersion while in bath-tub, street and highway,
while engaged in sports activity
• X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities
16O p en G A L E N
The ontology nested in the EHRThe ontology nested in the EHR
the ehr (hl7 rim)[moodCode=“Event” subject=“Relative” code={ } ]diabetes (subject person_in_family)
the ontology (snomed-ct) <family_hx (assoc_find Diabetes)> the combined meaning
What is legal? Required? Mandatory?What is legal? Required? Mandatory?……
17O p en G A L E N
Developing Software Engineering Developing Software Engineering Methodologies for Ontologies: Methodologies for Ontologies:
• Building a life cycle
– Use/test cases & exemplars
– Identifying problems – alternative solutions - exploring consequences – deciding amongst alternatives
– Specifying solutions• Human and machine readable form
– Setting conformance tests for specifications• Building reference implementations
– Monitoring for problems• Recording of problems and changes
18O p en G A L E N
Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego
hand
extremity
body
acute
chronic
abnormalnormal
ischaemicdeletion
bacterial
polymorphism
cell
protein
gene
infectioninflammation
Lung
expression
19O p en G A L E N
Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego
“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of Chloride Ion causing Increase in Viscosity of Mucus in CysticFibrosis…”
“Hand which isanatomically normal”
20O p en G A L E N
Logical Constructs Logical Constructs build complex concepts build complex concepts
from modularisedfrom modularisedprimitivesprimitives
GenesSpecies
Protein
Function
Disease
Protein coded bygene in humans
Function ofProtein coded bygene in humans
Disease caused by abnormality inFunction ofProtein coded bygene in humans
Gene in humans
21O p en G A L E N
Normalising (untangling) Normalising (untangling) OntologiesOntologies
StructureFunction
Part-wholeStructure Function
Part-w
hole
22O p en G A L E N
A simplified example: A simplified example: Build a simple treeeBuild a simple treee
easy to maintaineasy to maintain
24O p en G A L E N
If you want more abstractions,If you want more abstractions,just add new definitionsjust add new definitions
(re-use existing data)(re-use existing data)
“Diseases linked to abnormal proteins”
26O p en G A L E N
And again – even for a quite different And again – even for a quite different category category
“Diseases linked genes described in the mouse”
28O p en G A L E N
Ontologies and Reference Information ResourcesOntologies and Reference Information Resources
• An ontology is just one part– Naming - Definitions & necessary conditions– Classification – Indexing
• Knowledge bases– What we know about those entities – what is true in general
• Databases– What we know about individuals– Instance stores – specialised databases that link to ontologies
• Plus– Lexicons– Metadata– Mappings
29O p en G A L E N
Data storeData
on Individuals
Prototypical
Knowledge
KnowledgeBase
Definitionalknowledge
Ontology
Meta Data
Annotation
Linguistic Knowledge
30O p en G A L E N
Example 1: Indexing Drug Example 1: Indexing Drug ContraindicationsContraindications
(or guidelines or information or…)(or guidelines or information or…)
use of beta blocker
in asthma
beta blocker asthma
serious
contraindication
mild
contraindication
cardioselective cardioselectivebeta blocker
use of cardioselectivebeta blocker
in asthma
31O p en G A L E N
Idiopathic Hypertensionin our co’s Phase 2 study
Example 2: Indexing data entry formsExample 2: Indexing data entry formsFractal tailoring forms for clinical trialsFractal tailoring forms for clinical trials
Hypertension
Idiopathic Hypertension
In our company’s studies
In Phase 2 studies
Hypertension
Idiopathic Hypertension`
In our company’s studies
In Phase 2 studies
32O p en G A L E N
Example 3: PEN&PADExample 3: PEN&PADFractal Tailoring of ‘fail soft’ formsFractal Tailoring of ‘fail soft’ forms
What is it sensible to say about …?
34O p en G A L E N
Technical Barriers to linking ontologiesTechnical Barriers to linking ontologies
• Overlap
– Linking independent ontologies easyOverlap ALWAYS brings differences in meaning
– To integrate, separate
• Appropriate levels of abstraction
– Genetics/Genomics is changing disease clqssification• “Anti-angina drugs”• “Ingredients conjugated in the liver”
• Feedback
– New biology new clinical classifications …Disciplin required to keep separations
• Views
– Anatomy – Tissues (developmental) vs Structures vs Functions
35O p en G A L E N
Nontechnical barriers to linking ontologiesNontechnical barriers to linking ontologies• Organisational barriers
– How to keep separation and scope of individual ontologies• All enterprises tend to expand and encroach
• Discipline barriers– Task barriers
• Fit for one purpose is not fit for all purposes
• Language barriers– Between communities as well as languages
• IP barriers
• Process– Collaborative distributed vs Centralised – Authority– Life cycle and rate of change
• GO runs at web speed – seconds - days• SNOMED runs at e-publishing speed – 6mo-3 years• ICD runs at print/committee speed – 10-20 years
36O p en G A L E N
““GoodGood ontologiesontologies””
• Fitness for purpose
– What’s it for?
– Defined scope
• “Ownership” by users
– A language belongs to its community
• Human factors
– Understandability, Reliability!
• Evaluation criteria
– How do we know if it meets its purpose?Evolution
““Process not Product!Process not Product!””
37O p en G A L E N
““Good ontologies”Good ontologies”
• Internal Structure– Consistency
– Modularity & Normalisation
• Software engineering issues~ Architecture & Tools– It’s software! It evolves! It’s a standard!
Conformance and regression testing matter
• Philosophical clarity– Class-instance divide correct
• “Instances” are different in ontologies and databases• Ontologies are about a view of the world
Not about how to store information in a database
– Clear distinction between part-whole and kind-of
38O p en G A L E N
Grounding cost vs Cleanup costGrounding cost vs Cleanup cost
• What do we need to share?– What is broken?
• How much do we need to know to communicate?– Easy to build too much
• And very costly!
– “Just in time ontology”• Use logic• Use the web• Bio / OBO does well
Medicine so far doing badly
39O p en G A L E N
Important Ontologies & related standardsImportant Ontologies & related standards• OBO (Open Biomedical Ontologies)
– Gene Ontology– MGED family– …
• UMLS– Massive resource for cross referencing– Use CUIs & LUIs – “Concept Unique IDs” “Lexical Unique IDs”
• SNOMED-CT– SNOMED-International
• Anatomy– Digital Anatomist FMA, Mouse Developmental, Mouse Adult– SAEL – Standard Anatomy Entry List
• NCICB– CaCORE ontology
• National minimum data sets – controlled vocabularies
• HL7, LOINC, DICOM, CDISC, …
• OpenGALEN – source for experimentation and development
• Bio databases – at least implicit controlled vocabularies– Swissprot, OMIM, , ENSEMBLE, PRINTs, … … …
40O p en G A L E N
Summary: Planning forSummary: Planning forNaming, Classifying, IndexingNaming, Classifying, Indexing
• What is it for? Is there a gap? What is needed?– What are the use cases? Criteria for success?– Does it exist already?– Is an ontology the answer? Is an ontology needed for the answer?– What else is needed?
• A reference knowledge source?
– What is the MINIMUM that one can do?
• Who will own it?– Can we build it collaboratively?– What is the authority?
• How will it evolve?– What is the pace of change?
• Can we do it “just in time”?
– Can we evaluate and test it – again and again?