Upload
hailey-archer
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Judith A. Blake, David P. Hill, Barry Smith
BioOntologies SIG: ViennaJuly 20, 2007
Gene Ontology Annotations:What they mean and where they come
from
GO Consortium Project Goals
1. We will maintain comprehensive, logically rigorous and biologically accurate ontologies.
*2. We will comprehensively annotate reference genomes in as complete detail as possible.
*3. We will support annotation across all organisms.
4. We will provide our annotations and tools to the research community.
GO terms are used for functional annotations
I
Brain development [GO:0007420] (141 genes, 207 annotations)
I
GO Stats:
I
GO Annotations
Total experimental GO annotations - 388,633
Total proteins with manual annotations – 80,402
Contributing Groups (including MGI): - 19
Total Pub Med References – 346,002
Total number predicted annotations – 17,029,553
Total number taxa – 129,318
Total number distinct proteins – 2,971,374
April 24, 2007
Annotations provide the connection between genomic information and the GO.
Experiments provide the data that enables us to annotate gene products with terms from the ontologies.
Annotations for App: amyloid beta (A4) precursor protein
Annotations are assertions
IDA: Inferred from direct assay IPI: Inferred from physical interaction IMP: Inferred from mutant phenotype IGI: Inferred from genetic interaction IEP: Inferred from expression pattern ISS: Inferred from sequence or structural similarity TAS: Traceable author statement NAS:Non-traceable author statement IC: Inferred by curator RCA: Reviewed Computational Analysis IEA: Inferred from electronic annotation ND: no data available
NO Direct Experiment
Direct Experiment
We use evidence codes to describe the basis of the annotation
Examples of how we connect instances with knowledge representation in the GO
What follows are examples of annotation of the biomedical literature
using GO types, gene product types and evidence codes
Example #1:Molecular Function using IDA
Figure from
Zhang M, Chen W, Smith SM, Napoli JL.Molecular characterization of a mouse short chain dehydrogenase/reductase active with all-trans-retinol in intact cells, mRDH1.J Biol Chem. 2001 Nov 23;276(47):44083-90.
The Annotation:
The Observation
NAD+
NADHH+
What are the instances in this experiment?
Gene product instances Molecules of retinol dehydrogenase
Molecular function instances Instances of execution of the molecular function
revealed by the assay Instances of molecular function associated with
instances of retinol dehydrogenase. These instances are the potential of a molecule of retinol dehydrogenase to execute the function retinol dehydrogenase activity.
We are interested in understanding how gene products contribute to the biology
of an organism.
What knowledge are we trying to capture?
They do experiments!
Experiments are designed to study the properties of gene product instances.
Experimental biologists take on “The Burden of Proof”.
How do wet-bench biologists learn about gene products?
We* make annotations!
******
Annotations connect what wet-bench biologists see in the lab with how we represent our current
understanding of biological reality
How do we represent the accumulated knowledge?
* GO curators
The instances are in the lab. We use what people report about instances, but
we never actually deal with them directly
So, where are the instances?
Gene Product Type Stands proxy for the ‘gene’
Genes are what we have in MODs Types = what instances have in common
Gene Product Instance A molecule of a gene product
It can be physically isolated It takes up space
What do we mean by gene product?
An annotation Asserts that instances of molecules of a type of
gene product have propensity to act as designated by the terms in an ontology such as the GO
Is created on the basis of observations of the instances of such types in experiments and of the inferences drawn from such observations
Note: comprehensive experimental details are embedded in biomedical publications and in specialized databases
What do we mean by annotations?
Example #2: Molecular Function using IMP
Figure from
Schulz S, Lopez MJ, Kuhn M, Garbers DL.Disruption of the guanylyl cyclase-C gene leads to a paradoxical phenotype of viable but heat-stable enterotoxin-resistant mice.J Clin Invest. 1997 Sep 15;100(6):1590-5.
Example #2: Molecular Function using IMP
The Annotation:
The Observation
X X
IMP
What are the instances in this experiment?
Gene product instances Molecules of GUCY2C protein The lack of functional molecules of GUCY2C in
mutants
Molecular function instances The execution of the molecular function, measured
by the accumulation of cGMP The potential of a molecule of GUCY2C to execute
the molecular function Revealed by the correlation between a lack of
molecules and a lack of executions of molecular function
The Curator Perspective: Annotation Process
1. Identification of relevant experimental data
- Biomedical literature as primary source
- Annotations inferred from experiments in performed in other organisms or inferred from sequence structure
The Curator Perspective: Annotation Process
1. Identification of relevant experimental data
2. Identification of the appropriate ontology annotation term
- Experimental assay influences limit of resolution/granularityof term assignment available to use
- Differences in expertise among curators should result in close, but not necessarily exact, GO term annotations
The Curator Perspective: Annotation Process
1. Identification of relevant experimental data
1. Identification of the appropriate ontology annotation term
2. Employment of annotation quality control processes for - Correct formal structure
- Evaluate annotation consistency- Harvest emerging knowledge to refine and
extend the GO
Example #3: Biological Process Using IMP
Washington Smoak I; Byrd NA; Abu-Issa R; Goddeeris MM; Anderson R; Morris J; Yamamura K; Klingensmith J; Meyers EN, Sonic hedgehog is required for cardiac outflow tract and neural crest cell development.,
Dev Biol 2005 Jul 15;283(2):357-72.
The Annotation:
The Observation
IMP
X
What are the instances in this Experiment?
Gene product instances Molecules of the Shh gene Non-functional molecules of the Shh gene
Biological Process instances The development of a mouse heart
Molecular Function Instances The execution of a molecular function by a
molecule of the Shh gene
So, when a biological process occurs,it is the result of molecules
of a gene product(s) executingtheir molecular function(s)
How do wet-bench biologists learn about gene products?
They do experiments!Experiments are designed to study the properties of
gene product instances.Experimental biologists take on “The Burden of
Proof”.
They make conclusionsabout gene product
typesbased on the accumulated
experimental data!
If experiments show:
All instances of a gene product studied have the potential to execute the function tyrosine kinase
Instances of the same gene product are involved in the biological process limb development
All instances of the same gene product are found in instances of the cytoplasm
A wet-bench biologist would conclude:
The gene product of this gene is a tyrosine kinase that functions in the cytoplasm and the tyrosine
kinase functioning is used in limb development
If we comprehensively annotate genes, can we make the same conclusions?
Analysis of gene product annotations lead to new hypothesis for wet-bench biologists to test
This is the basis of biological discovery!
Development of GO depends on intersection of curation with ontology refinements
New results may stand in conflict with current version of ontology
Process of annotation brings new experimental results into perspective with existing scientific knowledge captured in the ontology
One of strengths of GO development paradigm is that it is primarily a task of biologist-curators who are experts in understanding the experimental systems
Experimental Literature
Hypothesisgeneration
Informatics Resources
Data mining, and prediction using ontologies
Experiments and data analysis using GO, etc
Improved annotations, in MODs, UniProt;
Refine bio-ontologies
Summary
Gene product annotation is an integral aspect of the work of the GO Consortium
Annotations reflect conclusions from experiments as interpreted by the biologist and reviewed by peers
The structure of the GO depends upon accumulated knowledge from many experiments resulting in a representation of current thought about biological reality
As experimental data changes our view of reality, the ontology must change as well