42
http://ncor.us 1 Ontologies in Biomedicine: The Good, The Bad and The Ugly Barry Smith http:// ontology.buffalo.edu/ smith

Ontologies in Biomedicine: The Good, The Bad and The Ugly

  • Upload
    seda

  • View
    13

  • Download
    0

Embed Size (px)

DESCRIPTION

Ontologies in Biomedicine: The Good, The Bad and The Ugly. Barry Smith http://ontology.buffalo.edu/smith. The Good. Foundational Model of Anatomy (FMA) Pro - PowerPoint PPT Presentation

Citation preview

Page 1: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 1

Ontologies in Biomedicine:

The Good, The Bad and The Ugly

Barry Smith

http://ontology.buffalo.edu/smith

Page 2: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 2

The GoodFoundational Model of Anatomy (FMA)

ProVery clear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromoleculePowerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning

ConSome unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé)

Page 3: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 3

Intermediate

GALENPro Allows formal representation of clinical information Allows multiple views of relevant detail as needed Uses powerful Description Logic (DL)-based

formal structureConRemains only partially developedContains errors: Vomitus contains carrot

– which DLs did not prevent

Page 4: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 4

IntermediateThe Gene Ontology

Con

Poor formal architecture

Full of errors

menopause part_of death

Poor support for automatic reasoning and error-checking

Poor treatment of definitions

Not trans-granular

No relation to time or instances

Page 5: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 5

The Gene Ontology

Pro

Open Source

Cross-Species

... has recognized the need for reform, including explicit representation of granular levels

Page 6: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 6

Problem of Circularity

GO:0042270:

Protection from natural killer cell mediated cytolysis

Definition: The process of protecting a cell from cytolysis by natural killer cells.

Page 7: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 7

GO:0019836 hemolysis

Definition: The processes that cause hemolysis

X =def. the Y of X

this is worse than circular

Page 8: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 8

The Bad

Reactome ProRich catalogue of biological process ConIncoherent treatment of categories:

ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles). Similarly CatalystActivity is a sibling of Event.

Page 9: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 9

The Bad

National Cancer Institute Thesaurus

ProOpen source; ambitiously broad coverage; DL-basedConPoor realization of DL formalismFull of mistakes (many inherited from its UMLS sources):– three disjoint classes of plants: Vascular Plant, Non-

vascular Plant, Other Plant

– three disjoint kinds of cells: Cell, Normal Cell, Abnormal Cell

– Normal Cell is_a Microanatomy

See http://ontology.buffalo.edu/medo/NCIT_Smith.html

Page 10: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 10

National Cancer Institute Thesaurus

Duratec, Lactobutyrin and Stilbene Aldehyde classified as: Unclassified Drugs and Chemicals

Pro

NCIT, too, has recognized the need for reform

(NCIT is part of the OBO library)

Page 11: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 11

The UglyUMLS Semantic Network

Pros

Broad coverage; no multiple inheritance

Cons

Incoherent use of ‘conceptual entities’

(e.g. the digestive system as a conceptual part of the organism)

Full of errors

Page 12: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 12

UMLS Semantic Network

Edges in the graph represent merely “possible significant relations”:– Bacterium causes Experimental Model of

Disease– Experimental Model of Disease affects

Fungus– Experimental model of disease is_a

Pathologic Function

Page 13: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 13

UMLS Semantic Network

Unclear what the nodes of the graph are:Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object

The use-mention confusion:“Swimming is healthy and has 8 letters”

Page 14: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 14

The Ugly

Clinical Terms Version 2 (The Read Codes)

Classifies chemicals into:

chemicals whose name begins with ‘A’,

chemicals whose name begins with ‘B’,

chemicals whose name begins with ‘C’, ...

Page 15: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 15

The Astonishingly (Criminally?) Ugly

Health Level 7HL7 is a UML-based standard for exchange

of information between clinical information systems

has proved very crumbly as a standardThe HL7 Reference Information Model (RIM)

is supposed to overcome this problem by defining the universe of healthcare data in a rigorous way

Page 16: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 16

HL7-RIM

AnimalDefinition: A subtype of Living Subject representing any

animal-of-interest to the Personnel Management domain.

PersonA subtype of Living Subject representing single human

being [sic] who, in the context of the Personnel Management domain, must also be uniquely identifiable through one or more legal documents.

LivingSubject Definition: A subtype of Entity representing an organism or

complex animal, alive or not.

Page 17: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 17

HL7 RIM: The Problem of Circularity

Person = Person with documents

has the form: ‘An A is an A which is B’– useless in practical terms since neither we

nor the machine can use them to find out what ‘A’ means

– incorporate a vicious infinite regress– have the effect of making it impossible to

refer to A’s which are not Bs, for example to an undocumented person

Page 18: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 18

HL7 Logically Incoherent

act = the record of an act

This has the form: An X is the Y of an X

again worse than circular

Page 19: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 19

HL7-RIM: Logically Contradictory Definitions

Definition of Act: An Act is an action of interest that has happened, can happen, is happening, is intended to happen, or is requested/demanded to happen.

Definition of Act: An Act is the record of something that is being done, has been done, can be done, or is intended or requested to be done.

Page 20: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 20

HL7 RIM Ontologically Incoherent

The truth about the real world is constructed through a combination and arbitration of attributed statements ...

As such, there is no distinction between an activity and its documentation.

Page 21: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 21

HL7 Incredibly Successful

• embraced as US federal standard;

• central part of $15 billion program to integrate all UK hospital information systems

• made mandatory by Canada Health Infoway

• adopted by Oracle as basis for its EHR support programs

Page 22: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 22

HL7 Merchandizing

Page 23: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 23

From molecules to diseases

A good ontology should enable us to organize our information resources in such a way that we can bridge the granularity gap between genomics and proteomics data and phenotype (clinical, pharmacological, patient-centered) data

Page 24: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 24

good ontologies require:

Coherent upper level taxonomy distinguishing• continuants (cells, molecules, organisms ...)• occurrents (events, processes)• dependent entities (qualities, functions ...)• independent entities (their bearers)• universals (types, kinds)• instances (tokens, instances)

Coherent relation ontology supporting inference both within and between ontologies.

Page 25: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 25

good ontologies require:

Consistent use of terms, supported by logically coherent (non-circular) definitions, in both human-readable and computable formats

Page 26: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 26

Open Biomedical Ontologies (OBO) Upper Biomedical Ontology (UBO)

root UBO:0000001:top subclass BFO:continuant:continuant

– subclass BFO:dependent_entity:dependent_entity • subclass UBO:0000023:quality

– subclass UBO:0000026:phenotype » subclass UBO:0000025:state

– subclass UBO:0000027:disease » subclass UBO:0000005:function

– subclass GO:0003674:molecular_function • subclass BFO:disposition:disposition

– subclass BFO:independent_entity:independent_entity • subclass UBO:0000002:substance

– subclass UBO:0000019:protein – subclass GO:0005575:cellular_component – subclass UBO:0000006:anatomical_entity

» subclass UBO:0000008:gross_anatomical_entity – subclass UBO:0000007:organism

» subclass UBO:0000015:microbe » subclass UBO:0000014:plant » subclass UBO:0000017:animal

• subclass BFO:fiat_part_of_substance:fiat_part_of_substance • subclass BFO:boundary_of_substance:boundary_of_substance • subclass BFO:aggregate_of_substances:aggregate_of_substances

subclass BFO:occurrent:occurrent – subclass BFO:dependent_occurrent:dependent_occurrent

• subclass UBO:0000004:process – subclass GO:0008150:biological_process

• subclass BFO:fiat_part_of_process:fiat_part_of_process – subclass UBO:0000029:life_cycle_stage

• subclass BFO:aggregate_of_processes:aggregate_of_processes – subclass EO:0007359:environment ontology

• subclass BFO:temporal_boundary_of_process:temporal_boundary_of_process – subclass BFO:independent_occurrent:independent_occurrent

Page 27: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 27

OBO Relation Ontology (RO)OBO Relation Ontology (RO)

• Clear distinction between universals (classes, kinds, types and instances (individuals, tokens

• Precise formal definitions of relations• Automatic applicability to time-indexed instance-

data e.g. in Electronic Health Record• Consistency with the Relation Ontology now a

criterion for admission to the OBO ontology library

see see Genome Biology Genome Biology Apr. 2006Apr. 2006

Page 28: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 28

Three types of relations

between instances:

Mary’s heart part_of Mary

between an instance and a universal:

Mary instance_of homo sapiens

between universals:

gastrulation part_of embryonic development

Page 29: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 29

A suite of primitive instance-level relations

identical_to

part_of

located_in

adjacent_to

earlier

derives_from

...

Page 30: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 30

A suite of defined relations between universals

Foundational is_apart_of

Spatial located_incontained_inadjacent_to

Temporal transformation_ofderives_frompreceded_by

Participation has_participanthas_agent

Page 31: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 31

GALEN: Vomitus contains carrot

All portions of vomit contain all portions of carrot

All portions of vomit contain some portion of carrot

Some portions of vomit contain some portion of carrot

Some portions of vomit contain all portions of carrot

Page 32: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 32

all-some structure

A part_of B =def. given any instance a of A there is some instance b of B such that a part_of b on the instance level

Allows automatic ontology integration via cascading reasoning:

A R1 B

B R2 C

A R3 C

Page 33: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 33

adjacent_to

cell wall adjacent_to cytoplasm

intron adjacent_to exon

Golgi apparatus adjacent_to endoplasmic

reticulum

periplasm adjacent_to plasma membrane

presynaptic membrane adjacent_to synaptic cleft

Page 34: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 34

A adjacent_to B

every instance of A stands in the instance-level adjacent_to relation to some instance of B

Page 35: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 35

adjacent_to as a relation between universals is not

symmetric

nucleus adjacent_to cytoplasm

Not: cytoplasm adjacent_to nucleus

seminal vesicle adjacent_to urinary bladder

Not: urinary bladder adjacent_to seminal vesicle

Page 36: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 36

The Granularity Gulf

most existing data-sources are of fixed, single granularity

many (all?) clinical phenomena cross granularities

Page 37: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 37

Main obstacle to integrating genetic and EHR data

No facility for dealing with time and instances (particulars, individuals) in current ontologies

Page 38: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 38

Key idea

To define ontological relations like

part_of, develops_from

it is not enough to look just at universals / classes / types / ‘concepts’ :

we need also to take account of instances and time

Page 39: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 39

transformation_of

A transformation_of B

=def. any instance of A was at some

earlier time an instance of B

Page 40: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 40

transformation_of

c at t1

C

c at t

C1

time

same instance

mature RNA transformation_of pre-RNA

adult transformation_of child

carcinomatous colon transformation_of colon

Page 41: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 41

transformation_of relations cross both time and granularity

C

c at t c at t1

C1

Page 42: Ontologies in Biomedicine:  The Good, The Bad and The Ugly

http://ncor.us 42

Advantages of the methodology of enforcing commonly accepted

coherent definitions

promote quality assurance (better coding)

guarantee automatic reasoning across ontologies and across data at different granularities

yields direct connection to times and instances in the EHR