39
1 O pen G ALEN Ontologies, Clinical and Genomic Ontologies, Clinical and Genomic Information Information How to say what we mean and mean what we say How to say what we mean and mean what we say Opportunities & Pitfalls Opportunities & Pitfalls Alan Rector, Jeremy Rogers, Chris Wroe Alan Rector, Jeremy Rogers, Chris Wroe Information Management Group / Bio Health Informatics Information Management Group / Bio Health Informatics Group Group Department of Computer Science, University of Manchester Department of Computer Science, University of Manchester [email protected] [email protected] www.clinical-escience.org www.clinical-escience.org www.co-ode.org www.co-ode.org www.opengalen.org www.opengalen.org protege.stanford.org protege.stanford.org

1 Ontologies, Clinical and Genomic Information How to say what we mean and mean what we say Opportunities & Pitfalls Alan Rector, Jeremy Rogers, Chris

Embed Size (px)

Citation preview

1O p en G A L E N

Ontologies, Clinical and Genomic InformationOntologies, Clinical and Genomic InformationHow to say what we mean and mean what we sayHow to say what we mean and mean what we say

Opportunities & PitfallsOpportunities & Pitfalls

Alan Rector, Jeremy Rogers, Chris WroeAlan Rector, Jeremy Rogers, Chris Wroe

Information Management Group / Bio Health Informatics GroupInformation Management Group / Bio Health Informatics GroupDepartment of Computer Science, University of ManchesterDepartment of Computer Science, University of Manchester

[email protected] [email protected] www.clinical-escience.orgwww.clinical-escience.org

www.co-ode.orgwww.co-ode.orgwww.opengalen.orgwww.opengalen.org

protege.stanford.orgprotege.stanford.org

2O p en G A L E N

What Is An “Ontology”?What Is An “Ontology”?

• Ontology (Socrates & Aristotle 400-360 BC)

– The study of being

• Word borrowed by computing for the explicit description of the conceptualisation of a domain:

– concepts (“entities”)

– properties and attributes of concepts

– constraints on properties and attributes

– Individuals (often, but not always)

• An ontology defines – a common vocabulary

– a shared understanding

– a classification

3O p en G A L E N

Sharing info Sharing info Sharing meaning Sharing meaning

Metadata• Data describing the content and

meaning of resources and services.

• But everyone must speak the same language…

Terminologies• Shared and common

vocabularies

• For search engines, agents, curators, authors and users

• But everyone must mean the same thing…

Service providerService provider

Service providerService providerService

providerService provider

Service providerService provider

Service providerService provider

Ontologies Shared and common

understanding of a domain Essential for search, exchange

and discovery

4O p en G A L E N

Measure the world…Measure the world…quantitative modelsquantitative models(not ontologies)(not ontologies)

• Quantitative

– Numerical data: • 2mm, 2.4V, between 4 and 5 feet

– Unambiguous tokens

– Main problem is accuracy at initial capture

– Numerical analysis (e.g. statistics) well understood

• Examples:

– How big is this breast lump?

– What is the average age of patients with cancer ?

– How much time elapsed between original referral and first appointment at the hospital ?

5O p en G A L E N

describe the the world – describe the the world – ontologiesontologies

• Qualitative

– Descriptive data• Cold, colder, blueish, not pink, drunk

– Ambiguous tokens• What’s wrong with being drunk ?

– Ask a glass of water.

– Accuracy poorly defined

• More examples

– How pleomorphic are the cells in the biopsy?

– What is a protein’s function?

– What is the derivation of a tissue?

6O p en G A L E N

Why Develop an Ontology?Why Develop an Ontology?Naming, Classifying, IndexingNaming, Classifying, Indexing

• To share common understanding of the structure of descriptive information – among people– among software agents– between people and software

• To enable reuse of domain knowledge

– to introduce standards to allow interoperability

• To index and annotate other resources

Semantic InteroperabilitySemantic InteroperabilityFoundation of the Semantic Web/GridFoundation of the Semantic Web/Grid

7O p en G A L E N

More ReasonsMore Reasons

• To make domain assumptions explicit– easier to change domain assumptions (consider a

genetics knowledge base)

– easier to understand and update legacy data

• To separate domain knowledge from the operational knowledge– re-use domain and operational knowledge separately

(e.g., configuration based on constraints)

• To manage the combinatorial explosion

8O p en G A L E N

A semantic continuumA semantic continuum

[Mike Uschold, Boeing Corp]

Shared human consensus

Implicit

Text descriptions

Pump: “a device for moving a gas or liquid from one place or container to another”

Informal(explicit)

Semantics hardwired; used at runtime

Formal(for humans)

Semantics processed and used at runtime

(pump has (superclasses (…))

Formal(for machines)

• Less ambiguity• Better inter-operation• More robust – less

hardwiring• More difficult

Further to the right

9O p en G A L E N

An Ontology should be just the An Ontology should be just the BeginningBeginning

OntologiesOntologies

Software agents

Software agents Problem-

solving methods

Problem-solving

methods Domain-independent applications

Domain-independent applications

DatabasesDatabasesDeclarestructure

Knowledgebases

Knowledgebases

Providedomain

description

The “Semantic

Web”

The “Semantic

Web”

10O p en G A L E N

What an Ontology Isn’tWhat an Ontology Isn’t(“It won’t make the coffee”)(“It won’t make the coffee”)

• A database– Ontologies are about categories/classes/types/concepts/entities

not instances• ABOUT diseases, genes, proteins, ...

NOT ABOUT specific patients, samples, studies, …

• A database/EHR schema– An ontology is about meaning rather than storage

• Although ontology technologies are a means for merging schemas

• A decision support/protocol management system– The entities used in the rules, not the rules

• A metadata schema– The entities used in the metadata, not the schema itself

• A lexicon– Meaning rather than language

• But every ontology needs language tools

11O p en G A L E N

Ontology TechnologiesOntology Technologies

• Description logics (DLs), OWL– Designed to provide logical support for automatic classification

and consistency checking • Designed for sharing and software engineering• Leverage off Semantic Web / Grid commnity

– But not everything in OWL is an ontology

• RDF(S)

• Specialised for groups– DAGEdit and other OBO tools; FMA explorer, …

• UML– Carefully developed UML models convey much information for

an ontology• But support only very simple inference and checking

12O p en G A L E N

Why it’s hard (1)Why it’s hard (1)• Language is slippery & local; Rigour & logic are hard

– Classification is too easy for people (to do badly)• But logical/computational properties unintuitive

– Combinatorial explosions

– Philosophical & “religious” differences

• Information capture – Data quality

– Tools & environments

• Different points of view– Oncology, Cardiology, …

– Adult, developmental, aetiological,…

– Clinical, genetic, genomic,

13O p en G A L E N

Why it’s hard (2)Why it’s hard (2)

• Need a combined model of meaning

– The EHR/Database holding the ontology PLUSThe ontology held

• Hard to scope – easy to do too much

– “Just in time” ontology• Better in the bio than the medical community

• Software engineering methods poorly understood

14O p en G A L E N

Classification is easy for people (to do badly)Classification is easy for people (to do badly)“On those remote pages it is written that animals are

divided into:

a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigse. mermaids f. fabulous ones g. stray dogs h. those that are included in this classificationi. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hair brush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance"

From The Celestial Emporium of Benevolent Knowledge, Borges

15O p en G A L E N

Avoiding combinatorial explosionsAvoiding combinatorial explosions

• The “Exploding Bicycle” From “phrase book” to “dictionary + grammar”

– 1980 - ICD-9 (E826) 8 – 1990 - READ-2 (T30..) 81– 1995 - READ-3 87– 1996 - ICD-10 (V10-19 Australian) 587

• V31.22 Injury or accident to the occupant of three-wheeled motor vehicle in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income

– and meanwhile elsewhere in ICD-10• W65.40 Drowning and submersion while in bath-tub, street and highway,

while engaged in sports activity

• X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities

16O p en G A L E N

The ontology nested in the EHRThe ontology nested in the EHR

the ehr (hl7 rim)[moodCode=“Event” subject=“Relative” code={ } ]diabetes (subject person_in_family)

the ontology (snomed-ct) <family_hx (assoc_find Diabetes)> the combined meaning

What is legal? Required? Mandatory?What is legal? Required? Mandatory?……

17O p en G A L E N

Developing Software Engineering Developing Software Engineering Methodologies for Ontologies: Methodologies for Ontologies:

• Building a life cycle

– Use/test cases & exemplars

– Identifying problems – alternative solutions - exploring consequences – deciding amongst alternatives

– Specifying solutions• Human and machine readable form

– Setting conformance tests for specifications• Building reference implementations

– Monitoring for problems• Recording of problems and changes

18O p en G A L E N

Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego

hand

extremity

body

acute

chronic

abnormalnormal

ischaemicdeletion

bacterial

polymorphism

cell

protein

gene

infectioninflammation

Lung

expression

19O p en G A L E N

Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego

“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of Chloride Ion causing Increase in Viscosity of Mucus in CysticFibrosis…”

“Hand which isanatomically normal”

20O p en G A L E N

Logical Constructs Logical Constructs build complex concepts build complex concepts

from modularisedfrom modularisedprimitivesprimitives

GenesSpecies

Protein

Function

Disease

Protein coded bygene in humans

Function ofProtein coded bygene in humans

Disease caused by abnormality inFunction ofProtein coded bygene in humans

Gene in humans

21O p en G A L E N

Normalising (untangling) Normalising (untangling) OntologiesOntologies

StructureFunction

Part-wholeStructure Function

Part-w

hole

22O p en G A L E N

A simplified example: A simplified example: Build a simple treeeBuild a simple treee

easy to maintaineasy to maintain

23O p en G A L E N

Let the classifier organise it Let the classifier organise it

24O p en G A L E N

If you want more abstractions,If you want more abstractions,just add new definitionsjust add new definitions

(re-use existing data)(re-use existing data)

“Diseases linked to abnormal proteins”

25O p en G A L E N

And let the classifier work againAnd let the classifier work again

26O p en G A L E N

And again – even for a quite different And again – even for a quite different category category

“Diseases linked genes described in the mouse”

28O p en G A L E N

Ontologies and Reference Information ResourcesOntologies and Reference Information Resources

• An ontology is just one part– Naming - Definitions & necessary conditions– Classification – Indexing

• Knowledge bases– What we know about those entities – what is true in general

• Databases– What we know about individuals– Instance stores – specialised databases that link to ontologies

• Plus– Lexicons– Metadata– Mappings

29O p en G A L E N

Data storeData

on Individuals

Prototypical

Knowledge

KnowledgeBase

Definitionalknowledge

Ontology

Meta Data

Annotation

Linguistic Knowledge

30O p en G A L E N

Example 1: Indexing Drug Example 1: Indexing Drug ContraindicationsContraindications

(or guidelines or information or…)(or guidelines or information or…)

use of beta blocker

in asthma

beta blocker asthma

serious

contraindication

mild

contraindication

cardioselective cardioselectivebeta blocker

use of cardioselectivebeta blocker

in asthma

31O p en G A L E N

Idiopathic Hypertensionin our co’s Phase 2 study

Example 2: Indexing data entry formsExample 2: Indexing data entry formsFractal tailoring forms for clinical trialsFractal tailoring forms for clinical trials

Hypertension

Idiopathic Hypertension

In our company’s studies

In Phase 2 studies

Hypertension

Idiopathic Hypertension`

In our company’s studies

In Phase 2 studies

32O p en G A L E N

Example 3: PEN&PADExample 3: PEN&PADFractal Tailoring of ‘fail soft’ formsFractal Tailoring of ‘fail soft’ forms

What is it sensible to say about …?

33O p en G A L E N

34O p en G A L E N

Technical Barriers to linking ontologiesTechnical Barriers to linking ontologies

• Overlap

– Linking independent ontologies easyOverlap ALWAYS brings differences in meaning

– To integrate, separate

• Appropriate levels of abstraction

– Genetics/Genomics is changing disease clqssification• “Anti-angina drugs”• “Ingredients conjugated in the liver”

• Feedback

– New biology new clinical classifications …Disciplin required to keep separations

• Views

– Anatomy – Tissues (developmental) vs Structures vs Functions

35O p en G A L E N

Nontechnical barriers to linking ontologiesNontechnical barriers to linking ontologies• Organisational barriers

– How to keep separation and scope of individual ontologies• All enterprises tend to expand and encroach

• Discipline barriers– Task barriers

• Fit for one purpose is not fit for all purposes

• Language barriers– Between communities as well as languages

• IP barriers

• Process– Collaborative distributed vs Centralised – Authority– Life cycle and rate of change

• GO runs at web speed – seconds - days• SNOMED runs at e-publishing speed – 6mo-3 years• ICD runs at print/committee speed – 10-20 years

36O p en G A L E N

““GoodGood ontologiesontologies””

• Fitness for purpose

– What’s it for?

– Defined scope

• “Ownership” by users

– A language belongs to its community

• Human factors

– Understandability, Reliability!

• Evaluation criteria

– How do we know if it meets its purpose?Evolution

““Process not Product!Process not Product!””

37O p en G A L E N

““Good ontologies”Good ontologies”

• Internal Structure– Consistency

– Modularity & Normalisation

• Software engineering issues~ Architecture & Tools– It’s software! It evolves! It’s a standard!

Conformance and regression testing matter

• Philosophical clarity– Class-instance divide correct

• “Instances” are different in ontologies and databases• Ontologies are about a view of the world

Not about how to store information in a database

– Clear distinction between part-whole and kind-of

38O p en G A L E N

Grounding cost vs Cleanup costGrounding cost vs Cleanup cost

• What do we need to share?– What is broken?

• How much do we need to know to communicate?– Easy to build too much

• And very costly!

– “Just in time ontology”• Use logic• Use the web• Bio / OBO does well

Medicine so far doing badly

39O p en G A L E N

Important Ontologies & related standardsImportant Ontologies & related standards• OBO (Open Biomedical Ontologies)

– Gene Ontology– MGED family– …

• UMLS– Massive resource for cross referencing– Use CUIs & LUIs – “Concept Unique IDs” “Lexical Unique IDs”

• SNOMED-CT– SNOMED-International

• Anatomy– Digital Anatomist FMA, Mouse Developmental, Mouse Adult– SAEL – Standard Anatomy Entry List

• NCICB– CaCORE ontology

• National minimum data sets – controlled vocabularies

• HL7, LOINC, DICOM, CDISC, …

• OpenGALEN – source for experimentation and development

• Bio databases – at least implicit controlled vocabularies– Swissprot, OMIM, , ENSEMBLE, PRINTs, … … …

40O p en G A L E N

Summary: Planning forSummary: Planning forNaming, Classifying, IndexingNaming, Classifying, Indexing

• What is it for? Is there a gap? What is needed?– What are the use cases? Criteria for success?– Does it exist already?– Is an ontology the answer? Is an ontology needed for the answer?– What else is needed?

• A reference knowledge source?

– What is the MINIMUM that one can do?

• Who will own it?– Can we build it collaboratively?– What is the authority?

• How will it evolve?– What is the pace of change?

• Can we do it “just in time”?

– Can we evaluate and test it – again and again?