52
Introduction to the Gene Ontology: A User’s Guide COST Functional Modeling Workshop 22-24 April, Helsinki

Introduction to the Gene Ontology: A User’s Guide

  • Upload
    teddy

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction to the Gene Ontology: A User’s Guide. COST Functional Modeling Workshop 22-24 April, Helsinki. Introduction to GO. The Gene Ontology Consortium The Gene ontology A GO annotation example GO evidence codes no GO vs ND Making Annotations - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to the Gene  Ontology: A User’s Guide

Introduction to the Gene Ontology: A User’s Guide

COST Functional Modeling Workshop22-24 April, Helsinki

Page 2: Introduction to the Gene  Ontology: A User’s Guide

Introduction to GO• The Gene Ontology Consortium• The Gene ontology• A GO annotation example• GO evidence codes

• no GO vs ND• Making Annotations• Multiple annotations - the gene association (ga) file

• Sources of GO

Page 3: Introduction to the Gene  Ontology: A User’s Guide

THE GENE ONTOLOGY CONSORTIUM

Page 4: Introduction to the Gene  Ontology: A User’s Guide

http://www.geneontology.org/

Page 5: Introduction to the Gene  Ontology: A User’s Guide

The GO Consortium provides:• central repository for ontology updates and annotations• central mechanism for changing GO terms (adding, editing,

deleting)• quality checking for annotations• consistency checks for how annotations are made by different

groups• central source of information for users• co-ordination of annotation effort

Page 6: Introduction to the Gene  Ontology: A User’s Guide

GO Consortium and GO Groups:• groups decide gene product set to annotate• biocurator training• tool development mostly by groups

• many non-consortium groups• education and training by groups• outreach to biocurators/databases by GOC

Page 7: Introduction to the Gene  Ontology: A User’s Guide

Annotation Strategy• Experimental data

• many species have a body of published, experimental data

• Detailed, species-specific annotation: ‘depth’• Requires manual annotation of literature - slow

• Computational analysis• Can be automated - faster• Gives ‘breadth’ of coverage across the genome• Annotations are general• Relatively few annotation pipelines

Page 8: Introduction to the Gene  Ontology: A User’s Guide

Releasing GO Annotations GO annotations are stored at individual databases Sanity checks as data is entered – is all the data

required filled in? Databases do quality control (QC) checks and

submit to GO GO Consortium runs additional QC and collates

annotations Checked annotations are picked up by GO users

eg. public databases, genome browsers, array vendors, GO expression analysis tools

Page 9: Introduction to the Gene  Ontology: A User’s Guide

AgBase Biocurators

AgBasebiocuration

interface

AgBase database

‘sanity’ check

‘sanity’ check& GOC QC

EBI GOA Project

GO Consortiumdatabase

‘sanity’ check& GOC

QC ‘sanity’ check

GO analysis tools Microarray developers

UniProt dbQuickGO browserGO analysis toolsMicroarray developers

Public databases AmiGO browserGO analysis toolsMicroarray developers

AgBase Quality Checks & Releases

‘sanity’ check: checks to ensure all appropriate information is captured, no obsolete GO:IDs are used, etc.

Page 10: Introduction to the Gene  Ontology: A User’s Guide

THE GENE ONTOLOGY

Page 11: Introduction to the Gene  Ontology: A User’s Guide

Gene Ontology (GO)• Not about genes!

• Gene products: genes, transcripts, ncRNA, proteins

• The GO describes gene product function

• Not a single ontology• Biological Process (BP or P)• Molecular Function (MF or F)• Cellular Component (CC or C)

• de facto method for functional annotation

• Widely used for functional genomics (high throughput).

Page 12: Introduction to the Gene  Ontology: A User’s Guide

What the GO doesn’t do:• Does not describe individual gene products

• e.g. cytochrome c is not in the GO but oxidoreductase activity is• Does not describe mutants or diseases, e.g. oncogenesis.• Does not include sequence attributes, e.g., exons, introns,

protein domains.• Is not a database of sequences.

Page 13: Introduction to the Gene  Ontology: A User’s Guide

What is the Gene Ontology?

• assign functions to gene products at different levels, depending on how much is known about a gene product • is used for a diverse range of species• structured to be queried at different levels, eg:

• find all the chicken gene products in the genome that are involved in signal transduction

• zoom in on all the receptor tyrosine kinases • human readable GO function has a digital tag to allow computational analysis of large datasets

“a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and

changing”

Page 14: Introduction to the Gene  Ontology: A User’s Guide

Ontologiesdigital identifier

(computers)

description(humans)

relationships between terms

Page 15: Introduction to the Gene  Ontology: A User’s Guide

A GO ANNOTATION EXAMPLE

Page 16: Introduction to the Gene  Ontology: A User’s Guide

NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa

Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA

Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA

NDUFAB1

Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA

A GO Annotation Example

Page 17: Introduction to the Gene  Ontology: A User’s Guide

aspect or ontologyGO:ID (unique)

GO term nameGO evidence code

NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa

A GO Annotation Example

Page 18: Introduction to the Gene  Ontology: A User’s Guide

GO EVIDENCE CODES& MAKING ANNOTATIONS

Page 19: Introduction to the Gene  Ontology: A User’s Guide

Why record GO evidence code?• GO did not initially record evidence for functional

assertion:• NR: Not Recorded

• “inferred from…”• deduce or conclude (information) from evidence

and reasoning• provides information about the support for

associating a gene product with a function• different experiments allow us to draw different

conclusions• reliability

Page 20: Introduction to the Gene  Ontology: A User’s Guide
Page 21: Introduction to the Gene  Ontology: A User’s Guide
Page 22: Introduction to the Gene  Ontology: A User’s Guide

Types of GO Evidence Codes

1. Experimental Evidence Codes2. Computational Analysis Evidence Codes3. Author Statement Evidence Codes4. Curator Statement Evidence Codes5. Automatically-assigned Evidence Codes6. Obsolete Evidence Codes

Page 23: Introduction to the Gene  Ontology: A User’s Guide

GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction

Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation

OtherNR - not recorded (historical)ND - no biological data available

ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

Guide to GO Evidence Codes http://www.geneontology.org/GO.evidence.shtml

Page 24: Introduction to the Gene  Ontology: A User’s Guide

GO Mapping Example

NDUFAB1

GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction

Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation

OtherNR - not recorded (historical)ND - no biological data available

Biocuration of literature• detailed function • “depth”• slower (manual)

Page 25: Introduction to the Gene  Ontology: A User’s Guide

P05147

PMID: 2976880

Find a paperabout the protein.

Biocuration of Literature:detailed gene function

Page 26: Introduction to the Gene  Ontology: A User’s Guide

Read paper to get experimental evidence of function

Use most specific termpossible

experiment assayed kinase activity:use IDA evidence code

Page 27: Introduction to the Gene  Ontology: A User’s Guide

GO Mapping Example

NDUFAB1

GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction

Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation

OtherNR - not recorded (historical)ND - no biological data available

ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

Biocuration of literature• detailed function • “depth”• slower (manual)

Sequence analysis• rapid (computational)• “breadth” of coverage • less detailed

Page 28: Introduction to the Gene  Ontology: A User’s Guide

Computational Analysis Evidence

In the beginning:• IGC: Inferred from Genomic Context

• e.g. operons• RCA: inferred from Reviewed Computational Analysis

• computational analyses that integrate datasets of several types

• ISS: Inferred from Sequence or Structural Similarity

Page 29: Introduction to the Gene  Ontology: A User’s Guide

Computational Analysis Evidence

• Then different types of sequence analysis added:ISS: Inferred from Sequence or Structural Similarity

• ISO: Inferred from Sequence Orthology• ISA: Inferred from Sequence Alignment• ISM: Inferred from Sequence Model

Page 30: Introduction to the Gene  Ontology: A User’s Guide

Computational Analysis Evidence

• Phylogenetic analysis codes added:• IBA: Inferred from Biological aspect of Ancestor• IBD: Inferred from Biological aspect of Descendant• IKR: Inferred from Key Residues

• characterized by the loss of key sequence residues - implies a NOT annotation

• IRD: Inferred from Rapid Divergence• characterized by rapid divergence from ancestral sequence –

implies a NOT annotation

Page 31: Introduction to the Gene  Ontology: A User’s Guide

Unknown Function vs No GO• ND – no data

• Biocurators have tried to add GO but there is no functional data available

• Previously: “process_unknown”, “function_unknown”, “component_unknown”

• Now: “biological process”, “molecular function”, “cellular component”

• No annotations (including no “ND”): biocurators have not annotated• this is important for your dataset: what % has GO?

Page 32: Introduction to the Gene  Ontology: A User’s Guide
Page 33: Introduction to the Gene  Ontology: A User’s Guide

MULTIPLE ANNOTATIONS: GENE ASSOCIATION FILES

Page 34: Introduction to the Gene  Ontology: A User’s Guide

The gene association (ga) file• standard file format used to capture GO annotation

data• tab-delimited file containing 17* fields of information:

• Information about the gene product (database, accession, name, symbol, synonyms, species)

• information about the function:• GO ID, ontology, reference, evidence, qualifiers, context

(with/from)• data about the functional annotation

• date, annotator

* GO Annotation File Format 2.0 has two additional columns compared to GAF 1.0: annotation extension (column 16) and gene product form ID (column 17).

Page 35: Introduction to the Gene  Ontology: A User’s Guide

http://www.geneontology.org/GO.format.gaf-2_0.shtml

Page 36: Introduction to the Gene  Ontology: A User’s Guide

(additional column added to this example)

Page 37: Introduction to the Gene  Ontology: A User’s Guide

gene product information

Page 38: Introduction to the Gene  Ontology: A User’s Guide

metadata: when & who

Page 39: Introduction to the Gene  Ontology: A User’s Guide

function information

Page 40: Introduction to the Gene  Ontology: A User’s Guide

Used to give more specific information about the evidence code(not always displayed)

Page 41: Introduction to the Gene  Ontology: A User’s Guide

Used to qualify the annotation(not always displayed)

Page 42: Introduction to the Gene  Ontology: A User’s Guide
Page 43: Introduction to the Gene  Ontology: A User’s Guide

Gene association files• GO Consortium ga files

• many organism specific files• also includes EBI GOA files

• EBI GOA ga files• UniProt file contains GO annotation for all species

represented in UniProtKB• AgBase ga files

• organism specific files• AgBase GOC file – submitted to GO Consortium &

EBI GOA• AgBase Community file – GO annotations not yet

submitted or not supported / annotations provided by researchers

• all files are quality checked

Page 44: Introduction to the Gene  Ontology: A User’s Guide

http://www.geneontology.org

Page 45: Introduction to the Gene  Ontology: A User’s Guide

http://www.ebi.ac.uk/GOA/

Page 46: Introduction to the Gene  Ontology: A User’s Guide

http://www.agbase.msstate.edu/

Page 47: Introduction to the Gene  Ontology: A User’s Guide
Page 48: Introduction to the Gene  Ontology: A User’s Guide

1. Primary sources of GO: from the GO Consortium (GOC) & GOC members

• most up to date• most comprehensive

2. Secondary sources: other resources that use GO provided by GOC members

• public databases (eg. NCBI, UniProtKB)• genome browsers (eg. Ensembl)• array vendors (eg. Affymetrix)• GO expression analysis tools

Sources of GO

Page 49: Introduction to the Gene  Ontology: A User’s Guide

• Different tools and databases display the GO annotations differently.

• Since GO terms are continually changing and GO annotations are continually added, need to know when GO annotations were last updated.

Sources of GO annotation

Page 50: Introduction to the Gene  Ontology: A User’s Guide

EXAMPLES: public databases (eg. NCBI, UniProtKB) genome browsers (eg. Ensembl) array vendors (eg. Affymetrix)

CONSIDERATIONS: What is the original source? When was it last updated? Are evidence codes displayed?

Secondary Sources of GO annotation

Page 51: Introduction to the Gene  Ontology: A User’s Guide

Differences in displaying GO annotations: secondary/tertiary sources.

Page 52: Introduction to the Gene  Ontology: A User’s Guide

For more information about GO• GO Evidence Codes:

http://www.geneontology.org/GO.evidence.shtml• gene association file information:

http://www.geneontology.org/GO.format.annotation.shtml• tools that use the GO:

http://www.geneontology.org/GO.tools.shtml• GO Consortium wiki:

http://wiki.geneontology.org/index.php/Main_Page

All websites are listed on the AgBase workshop website.