35
Practical Ontologies Lessons from the GO February 2011

Practical Ontologies

Embed Size (px)

DESCRIPTION

Practical Ontologies. Lessons from the GO February 2011. The time was 1998-99. None of the model organism databases used standard terminology to describe biological function Drosophila sequence was imminent Largest genome sequenced at that time - PowerPoint PPT Presentation

Citation preview

Page 1: Practical Ontologies

Practical Ontologies

Lessons from the GOFebruary 2011

Page 2: Practical Ontologies

The time was 1998-99

None of the model organism databases used standard terminology to describe biological function

Drosophila sequence was imminent Largest genome sequenced at that time Two weeks, 3 dozen scientists, all new software How could we organize the annotation?

microArray technology was the latest research tool, and results needed to be described

AI folk and ontologists organized the first “bio-ontologies” workshop at ISMB

Page 3: Practical Ontologies

The Gene Ontology—the beginning

A handful of biologists (4) met in a bar in Montreal after the bio-ontologies workshop to share their frustrations and decided to just do it*… Would demonstrate possibilities for

data integration across the MODs (FlyBase, SGD, MGD)

Provided an organizing principle for the Drosophila genome annotation jamboree

* i.e. Describe gene products in a biologically meaningful way.

Page 4: Practical Ontologies

AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG

GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAGTCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTG

GGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT

Late summer 1999

Page 5: Practical Ontologies

reads sequence

Piles of data

Mountains of data

assemble analysis

filtering

First-pass predictions

converging

Tentative function

Love-at-first-sight

Functional knowns

‘GO’directories

Page 6: Practical Ontologies
Page 7: Practical Ontologies

The Gene Ontology project

Annotated now The importance of stress-testing Don’t delay, use your ontology today

Do no harm (KISS) i.e. Target the low hanging fruit, work

on the obvious, high-confidence steps Collaborate on concrete projects

Focusing the mind

Page 8: Practical Ontologies

Annotations

Have 3 primary components The ontology term(s) The entity instance (e.g. gene

product) The evidence for that assertion

An annotation is an evidence-based assertion which indicates that this entity is best classified/described by this term(s)

Page 9: Practical Ontologies

IDA

GO:0005720

What type of evidence?

SPCC622.16c

Identify genes

PMID:17449867

Read paper(s)

Identify GO terms

Identify GO terms associated with each gene

SPCC622.16c GO:0005720 IDA PMID:17449867

Page 10: Practical Ontologies

= bud initiation

= bud initiation

= bud initiation

The same name can be used to describe different things.

Classification rule: Disambiguation

Page 11: Practical Ontologies

= tooth bud initiation

= cellular bud initiation

= flower bud initiation

Include plain “bud initiation” as a synonym for each of these terms

Classification rule: Disambiguation

Page 12: Practical Ontologies

Exactly the same thing can be described with different terms

Disambiguation

Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis

Comparison is difficult, especially across species or across databases that each use one of these different variants

Use a single term, and plenty of synonyms

Page 13: Practical Ontologies

Annotation for a healthy ontology

Easier to find the most accurate term(s) to use Avoids annotation errors

Easier for new curators to learn and understand

Develop annotation guidelines and training material

Enables automatic reasoning for searching & inference

Bottom line: Following basic construction rules makes

more useful ontologies

Page 14: Practical Ontologies

Doh! I get it now, says the computer.

Typical ontologydeveloper

Typical wet lab PI

annotating data

Improvement needed: Closing the loop

Page 15: Practical Ontologies

The Gene Ontology project

Annotated now The importance of stress-testing Don’t delay, use your ontology today

Do no harm (KISS) i.e. Target the low hanging fruit, work

on the obvious, high-confidence steps Collaborate on concrete projects

Focusing the mind

Page 16: Practical Ontologies

GO in 2000-2008

Page 17: Practical Ontologies

Filling in annotation gaps

GO:0016301kinase activityGO:0016301

kinase activityGO:0016310

phosphorylationGO:0016310

phosphorylation

|P| = 3640|F| = 6053|F ∩ P| = 2230|F ∩ not P| = 3823

2230

14103823

July 2008

Page 18: Practical Ontologies

part_of

Page 19: Practical Ontologies

part_of

annotations propagateover part_of

KIC1 IDA

Page 20: Practical Ontologies

part_of

annotations propagateover part_of

IDAKIC1

Page 21: Practical Ontologies

part_of

NDK1IDA

annotations propagateover part_of

Page 22: Practical Ontologies

part_of

annotations propagateover part_of

NDK1IDA

Page 23: Practical Ontologies

Filling in annotation gaps

GO:0016301kinase activityGO:0016301

kinase activity

GO:0016310 phosphorylationGO:0016310 phosphorylation

2009

Page 24: Practical Ontologies

The H word—2011

Characters in common are due to inheritance Allows inferences about common ancestor

time

divergence

Page 25: Practical Ontologies

Evolution of MSH2 subfamilybiological process

DNA repair

Maintenance ofDNA repeats

Homologousrecombination

Apoptosis

Somatic hypermutation of immunoglobulin genes

Page 26: Practical Ontologies

Ancestral inference

• Integration at points of common ancestry• Infer “hidden” character of living organisms• Explicitly leverage evolutionary relationships

E.c.A.t. MTHFR1A.t. MTHFR2D.d.

S.p.S.c. MET13

D.m.A.g.

S.p.S.c. MET12C.e.

D.r.

G.g.

H.s. MTHFRR.n.M.m.

divergence

Biochemistry: purification and assay

Genetics: mutant phenotypes

Page 27: Practical Ontologies

Integrating different GO annotations

PAINTPAINTPhylogenetic Annotation and Inference Tool

Page 28: Practical Ontologies

The Gene Ontology project

Annotated now The importance of stress-testing Don’t delay, use your ontology today

Do no harm (KISS) i.e. Target the low hanging fruit, work

on the obvious, high-confidence steps Collaborate on concrete projects

Focusing the mind

Page 29: Practical Ontologies

SGDSGD MGDMGD

2009

FlyBaseFlyBase

GO

Scoping

The ontology has a clearly specified and clearly delineated content.

Page 30: Practical Ontologies

Decisions to make the work easier

Provide definitions for everything Intelligible ontologies are more useful

To humans (for annotation) and To machines (for searching, reasoning

and error-checking)

Use content-free unique identifiers Drive all semantics away from

tracking Don’t confuse the representational

technology with the conceptual modeling

Page 31: Practical Ontologies

Implicit ontologies within the GO:

cysteine biosynthesis (ChEBI) myoblast fusion (Cell Type Ontology) hydrogen ion transporter activity (ChEBI) snoRNA catabolism (Sequence Ontology) wing disc pattern formation (Drosophila

anatomy) epidermal cell differentiation (Cell Type

Ontology) regulation of flower development (Plant

anatomy) B-cell differentiation (Cell Type Ontology)

Page 32: Practical Ontologies

brain development

brain development

hindbrain development

hindbrain development

metencephalon development

metencephalon development

pons development

pons development

trigeminal motor nucleus

development

trigeminal motor nucleus

development

GO

Implicit anatomy ontology within the GO:

Page 33: Practical Ontologies

Alpha-Synuclein Mouse

Substantia nigra

number

has part is bearer of

Lewy body

of

Ischemic Mouse

is bearer of

number

of

Condensed Mitochondrion

Nucleus

Lysosome

Orthodox Mitochondrion

Golgi Apparatus

Condensed Mitochondrion

Condensed Mitochondrion

Dark Material

Condensed Mitochondrion

Page 34: Practical Ontologies

Common Interest Sociology—to enlist the community, the ontology

must meet each individual group’s immediate needs. Too many people => Too many requirements

Outstanding problems Closing the loop between ontology construction and

ontology application QC improvements Prioritizing tasks Visualization …

Page 35: Practical Ontologies

A cast of thousands