51
Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol Bastiani (Wormbase Doug Howe (ZFIN), Stacia Engel (SGD)

Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Embed Size (px)

Citation preview

Page 1: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Introduction to GO Annotation

Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol Bastiani (Wormbase),Doug Howe (ZFIN), Stacia Engel (SGD)

Page 2: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

What is a GO annotation?

ReferencesGene

(protein coding gene,functional RNA)

GO Term Evidence code

IMP, IGI, IPI, ISS, IDA, IEP, TAS, NAS, ND, RCA, IC

Qualifiers NOT contributes_to colocalizes_with

With/From Supporting evidence for certain evidence codes

Page 3: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

• What is an annotation?

• Strategies for identifying literature to annotate

• Identifying the correct annotation• Molecular Function• Biological Process• Cellular Component

• Extent of annotation for a single gene product

• Strategies for annotating a genome

Page 4: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Which type of literature is appropriate for annotation?

• Papers with experimental evidence for GO process, function or component annotation• Mutant phenotype descriptions• Enzymatic activity assays• Localization studies

• Papers describing phylogenetic studies for GO function annotation (ISS)

• Reviews• (Textbooks)• (Meeting abstracts)

Page 5: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Strategies for reading a paper for annotation• Abstract• Results/Figures• Materials and Methods• Discussion

Page 6: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Which granularity of GO term is appropriate for annotation?

Molecular Function

Souza et al. (1998)YakA, a protein kinase required for the transition from

growth to development in Dictyostelium.PMID: 9584128

Page 7: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Background

• YakA was identified as a developmental mutant

• YakA is an ortholog of the yeast Yak1p

• The protein kinase domain of YakA is similar to both serine/threonine kinases and tyrosine kinases

PMID: 9584128

Page 8: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

YakA belongs to the DYRK family

YakA is a member of the DYRK family of protein kinases (dual-specificity tyrosine-regulated kinase)

Page 9: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

The Experiment

• Assay for YakA protein kinase activity

• YakA + γ32P-ATP + MBP (substrate)

• Look for presence of 32P in substrate in the presence of YakA

PMID: 9584128

Page 10: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

The Result

PMID: 9584128

Page 11: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

GO Term for Annotation

protein kinase activity ; GO:0004672

• MBP (myelin basic protein) is a generic substrate• Kinase specificity not determined; no phospho-

tyrosine antibodies used, for example

Definition: Catalysis of the transfer of a phosphate group, usually from ATP, to a protein substrate.

Page 12: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Searching for Terms in DAG-Edit

Search term name that contains:

• kinase 359 results

• protein kinase 60 results

• protein kinase activity 20 results

Page 13: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Search Output in DAG-Edit

Page 14: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Sibling Terms in DAG-Edit

Page 15: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Child Terms in DAG-Edit

Page 16: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Parent Terms in AmiGO

Page 17: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Evidence Code

• The evidence code for the protein kinase activity term is IDA (Inferred from Direct Assay)

• Although endogenous substrates were not tested, the authors clearly showed kinase activity with a direct assay

Page 18: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Granular Terms Using ISS

protein serine/threonine kinase activity ; GO:0004674

protein tyrosine kinase activity ; GO:0004713

(Inferred from Sequence or structural Similarity)

Page 19: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

How is Biological Process different form Molecular Function?

Molecular Function…

“Elemental activities, such as catalysis or binding, describing the actions of a gene product at the molecular level. A given gene product may exhibit one or more molecular functions.”

Biological Process…

“A phenomenon marked by changes that lead to a particular result, mediated by one or more gene products.”

is about the protein. is about the organism.

are the activities that a protein specifically and directly does.

are the organism uses those activities for.

Rho1 has GTPase activity… and the organism uses that activity for gastrulation, axon guidance, germ cell migration, etc …

A hammer hammers nails… and builds houses.

for example

Page 20: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Important points:

Process is a migration of germ (pole) cells.

It is the movement of cells from one side of the epithelium to the other.

It is one step in a three step process.

Page 21: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 22: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 23: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 24: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 25: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 26: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 27: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 28: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Is a new term needed?

Page 29: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

New term might be appropriate because it would describe a discrete, separable process, thus providing additional useful information to the user.

Also, a new term(s) permit linking two similar processes that are currently separate in GO, but are connected in the literature.

cell migration(is a) transepithelial cell migration

(is a) pole cell transepithelial migration(is a) cellular extravasation

cell migration(is a) germ cell migration

(is a) pole cell migration(part of) pole cell transepithelial migration

Page 30: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Annotating to the

Cellular Component

Ontology

Carol Bastiani, Caltech

Page 31: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 32: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Experiment: Immunolocalization of LIN-10 with a LIN-10 antibody.

Page 33: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Vulval epithelial cells can be distinguished from ventral cord neurons by their larger size and the presence of stained cell junctions (red)

Localization of LIN-10 by Immunoflourescence:

Page 34: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Figure 7.   LIN-10 is expressed in neurons. (A-C) Wild-type, late L3 hermaphrodite stained with anti-LIN-10 antibodies (green). LIN-10 is present in ventral cord processes (A, *), lateral neural cell bodies and processes (A and B, arrowheads), and dorsal cord processes

Page 35: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 36: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 37: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol
Page 38: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Search MGI GO Browser for neuron:

Page 39: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Choosing the evidence code:

Page 40: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

In neural cell bodies, a small amount of LIN-10 appears diffusely throughout the cytoplasm, whereas the majority of LIN-10 is concentrated in discrete perinuclear structures (Figure 7, D and E), similar to perinuclear structures observed in vulval epithelial cells. To determine whether these perinuclear structures correspond to Golgi, we used ST-GFP as a marker for the trans-cisterna of the Golgi (Jamora et al., 1997). We expressed ST-GFP in transgenic worms using a heat shock promoter and examined the subcellular localization of LIN-10 and ST-GFP using anti-LIN-10 and anti-GFP antibodies. In single neurons expressing both endogenous LIN-10 and transgenic ST-GFP, the subcellular pattern of LIN-10 staining is similar to that of ST-GFP staining. Deconvolution of images obtained in double-staining experiments revealed that LIN-10 staining is closely associated with ST-GFP staining (Figure 7, F-I), but LIN-10 staining is consistently offset (by 0.2-0.5 µm) from ST-GFP staining. These results indicate that LIN-10 is localized in the trans-cisterna of the Golgi or is localized in a compartment closely

associated with the trans-cisterna, such as the trans-Golgi network.

Further subcellular localization of LIN-10:

Page 41: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

LIN-10 is localized to:

1) Cytoplasm

2) Within or in association with a part of the Golgi apparatus/ in close association with the trans-cisterna or trans-Golgi network

Page 42: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

1) Annotate to cytoplasm:

Page 43: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

LIN-10 is localized to:

1) Cytoplasm

2) Within or in association with a part of the Golgi apparatus/ in close association with the trans-cisterna or trans-Golgi network

Page 44: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

2)Annotate to Golgi apparatus, evidence code IDA:

Page 45: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Qualifier to use “when the resolution of the assay is not accurate enough to say that the gene product is

a bona fida component member:”

Page 46: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Strategies for annotation of a genome

1. How to get a complete set of GO annotations

2. Updating GO annotations

3. Representative approaches

Page 47: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Strategies for annotation of a genome

• Complete a first pass

– For all 3 aspects (MF, BP, CC)

– For all genes that get GO annotations

• Proteins, RNAs, pseudogenes

• NOT centromeres, telomeres, LTRs,

retrotransposons, ARSs

– Unknowns are allowed

How to get a complete set of GO annotations

Page 48: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Strategies for annotation of a genome

• Second pass

– Replace unknowns

– Update where IEA was used

• Info with “better” evidence code, if available

– Update where other db’s are referenced

• Primary literature is preferred

Updating the complete set of GO annotations

Page 49: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Strategies for annotation of a genome

• GO annotations will never be “done”

• Part of normal curation process

– More specific information

– Better evidence code

• Replace obsolete terms

• “Last reviewed” date

Updating GO annotations - ongoing

Page 50: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Strategies for annotation of a genome

Updating GO annotations - ongoing

Page 51: Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol

Strategies for annotation of a genome

Representative approaches