25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax...

Preview:

Citation preview

25th June 2007 Jane Lomax

Using the Gene Ontology (GO) for analysis of

expression data

Jane LomaxEMBL-EBI

25th June 2007 Jane Lomax

What is the Gene Ontology?

• Set of standard biological phrases (terms) which are applied to genes/proteins:– protein kinase– apoptosis– membrane

25th June 2007 Jane Lomax

What is the Gene Ontology?

• Genes are linked, or associated, with GO terms by trained curators at genome databases– known as ‘gene associations’ or GO

annotations

• Some GO annotations created automatically

25th June 2007 Jane Lomax

gene -> GO term

associated genes

GO annotations

GO database

genome and protein databases

25th June 2007 Jane Lomax

What is the Gene Ontology?

• Allows biologists to make queries across large numbers of genes without researching each one individually

Copyright ©1998 by the National Academy of Sciences

Eisen, Michael B. et al. (1998) Proc. Natl. Acad. Sci. USA 95, 14863-14868

25th June 2007 Jane Lomax

GO structure

• GO isn’t just a flat list of biological terms

• terms are related within a hierarchy

25th June 2007 Jane Lomax

GO structure

gene A

25th June 2007 Jane Lomax

GO structure

• This means genes can be grouped according to user-defined levels

• Allows broad overview of gene set or genome

25th June 2007 Jane Lomax

How does GO work?

• GO is species independent– some terms, especially lower-level,

detailed terms may be specific to a certain group• e.g. photosynthesis

– But when collapsed up to the higher levels, terms are not dependent on species

25th June 2007 Jane Lomax

How does GO work?

• What does the gene product do?• Where and does it act?• Why does it perform these activities?

What information might we want to capture about a gene product?

25th June 2007 Jane Lomax

GO structure

• GO terms divided into three parts:– cellular component– molecular function– biological process

25th June 2007 Jane Lomax

Cellular Component

• where a gene product acts

25th June 2007 Jane Lomax

Cellular Component

25th June 2007 Jane Lomax

Cellular Component

25th June 2007 Jane Lomax

Cellular Component

• Enzyme complexes in the component ontology refer to places, not activities.

25th June 2007 Jane Lomax

Molecular Function

• activities or “jobs” of a gene product

glucose-6-phosphate isomerase activity

25th June 2007 Jane Lomax

Molecular Function

insulin bindinginsulin receptor activity

25th June 2007 Jane Lomax

Molecular Function

drug transporter activity

25th June 2007 Jane Lomax

Molecular Function

• A gene product may have several functions

• Sets of functions make up a biological process.

25th June 2007 Jane Lomax

Biological Process

a commonly recognized series of events

cell division

25th June 2007 Jane Lomax

Biological Process

transcription

25th June 2007 Jane Lomax

Biological Process

regulation of gluconeogenesis

25th June 2007 Jane Lomax

Biological Process

limb development

25th June 2007 Jane Lomax

Biological Process

courtship behavior

25th June 2007 Jane Lomax

Ontology Structure

• Terms are linked by two relationships– is-a – part-of

25th June 2007 Jane Lomax

Ontology Structure

cell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

is-apart-of

25th June 2007 Jane Lomax

Ontology Structure

• Ontologies are structured as a hierarchical directed acyclic graph (DAG)

• Terms can have more than one parent and zero, one or more children

25th June 2007 Jane Lomax

Ontology Structure

cell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

Directed Acyclic Graph (DAG) - multiple

parentage allowed

25th June 2007 Jane Lomax

Anatomy of a GO term

id: GO:0006094name: gluconeogenesisnamespace: processdef: The formation of glucose fromnoncarbohydrate precursors, such aspyruvate, amino acids and glycerol.[http://cancerweb.ncl.ac.uk/omd/index.html]exact_synonym: glucose biosynthesisxref_analog: MetaCyc:GLUCONEO-PWYis_a: GO:0006006is_a: GO:0006092

unique GO IDterm name

definition

synonymdatabase ref

parentage

ontology

25th June 2007 Jane Lomax

GO terms

• Where do GO terms come from?– GO terms are added by editors at EBI and

annotating databases– new terms are usually only added when they

are asked for by annotators– GO editors work with experts to make major

ontology developments• metabolism• pathogenesis• cell cycle

25th June 2007 Jane Lomax

GO stats

• over 23,000 GO terms:– 13593 biological_process– 1980 cellular_component– 7700 molecular_function

25th June 2007 Jane Lomax

GO annotations

• Where do the links between genes and GO terms come from?

25th June 2007 Jane Lomax

GO annotations• Contributing databases:

– Berkeley Drosophila Genome Project (BDGP)– dictyBase (Dictyostelium discoideum) – FlyBase (Drosophila melanogaster) – GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum,

Leishmania major and Trypanosoma brucei) – UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro

databases – Gramene (grains, including rice, Oryza) – Mouse Genome Database (MGD) and Gene Expression Database (GXD)

(Mus musculus) – Rat Genome Database (RGD) (Rattus norvegicus)– Reactome– Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) – The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) – The Institute for Genomic Research (TIGR): databases on several

bacterial species – WormBase (Caenorhabditis elegans) – Zebrafish Information Network (ZFIN): (Danio rerio)

25th June 2007 Jane Lomax

Species coverage

• All major eukaryotic model organism species

• Human via GOA group at UniProt• Several bacterial and parasite

species through TIGR and GeneDB at Sanger– many more in pipeline

25th June 2007 Jane Lomax

Annotation coverage

25th June 2007 Jane Lomax

Anatomy of a GO annotation

• Three key parts:– gene name/id

– GO term(s)

– evidence for association

25th June 2007 Jane Lomax

Example annotation

• Breast cancer type 1 susceptibility protein gene in humans

25th June 2007 Jane Lomax

Types of GO annotation:

Electronic Annotation

Manual Annotation

25th June 2007 Jane Lomax

Manual annotation

• Created by scientific curators

• High quality

• Small number

25th June 2007 Jane Lomax

Manual annotation

In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…

25th June 2007 Jane Lomax

Manual annotation

25th June 2007 Jane Lomax

Electronic Annotation

• Annotation derived without human validation– mappings file e.g. interpro2go, ec2go.– Blast search ‘hits’

• Lower ‘quality’ than manual codes

25th June 2007 Jane Lomax

Mappings files

Fatty acid biosynthesis ( Swiss-Prot Keyword)

EC:6.4.1.2 (EC number)

IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)

GO:Fatty acid biosynthesis

(GO:0006633)

GO:acetyl-CoA carboxylase activity

(GO:0003989)

GO:acetyl-CoA carboxylaseactivity

(GO:0003989)

25th June 2007 Jane Lomax

Evidence types

• ISS: Inferred from Sequence/structural Similarity• IDA: Inferred from Direct Assay• IPI: Inferred from Physical Interaction• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern• TAS: Traceable Author Statement• NAS: Non-traceable Author Statement• IC: Inferred by Curator• ND: No Data available

• IEA: Inferred from electronic annotation

25th June 2007 Jane Lomax

GO tools

• GO resources are freely available to anyone to use without restriction– Includes the ontologies, gene

associations and tools developed by GO

• Other groups have used GO to create tools for many purposes:

http://www.geneontology.org/GO.tools

25th June 2007 Jane Lomax

GO tools

• Affymetrix also provide a Gene Ontology Mining Tool as part of their NetAffx™ Analysis Center which returns GO terms for probe sets

25th June 2007 Jane Lomax

GO tools

• Many tools exist that use GO to find common biological functions from a list of genes:

http://www.geneontology.org/GO.tools.microarray.shtml

25th June 2007 Jane Lomax

GO tools

• Most of these tools work in a similar way:– input a gene list and a subset of

‘interesting’ genes– tool shows which GO categories have most

interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes

– tool provides a statistical measure to determine whether enrichment is significant

25th June 2007 Jane Lomax

Microarray process

• Treat samples• Collect mRNA• Label• Hybridize• Scan• Normalize• Select differentially regulated genes • Understand the biological phenomena

involved

25th June 2007 Jane Lomax

Traditional analysis

Gene 1ApoptosisCell-cell signalingProtein phosphorylationMitosis…

Gene 2Growth controlMitosisOncogenesisProtein phosphorylation…

Gene 3Growth controlMitosisOncogenesisProtein phosphorylation…

Gene 4Nervous systemPregnancyOncogenesisMitosis…

Gene 100Positive ctrl. of cell prolifMitosisOncogenesisGlucose transport…

25th June 2007 Jane Lomax

Traditional analysis

• gene by gene basis

• requires literature searching

• time-consuming

25th June 2007 Jane Lomax

Using GO annotations

• But by using GO annotations, this work has already been done for you!

GO:0006915 : apoptosis

25th June 2007 Jane Lomax

Grouping by process

ApoptosisGene 1Gene 53

MitosisGene 2Gene 5Gene45Gene 7Gene 35…

Positive ctrl. of cell prolif.Gene 7Gene 3Gene 12…

GrowthGene 5Gene 2Gene 6…

Glucose transportGene 7Gene 3Gene 6…

25th June 2007 Jane Lomax

GO for microarray analysis

• Annotations give ‘function’ label to genes

• Ask meaningful questions of microarray data e.g.– genes involved in the same process,

same/different expression patterns?

25th June 2007 Jane Lomax

Using GO in practice

• statistical measure – how likely your differentially regulated

genes fall into that category by chance

microarray

1000 genesexperiment

100 genes differentially regualted

mitosis – 80/100apoptosis – 40/100p. ctrl. cell prol. – 30/100glucose transp. – 20/100

0

10

20

30

40

50

60

70

80

mitosis apoptosis positive control ofcell proliferation

glucose transport

25th June 2007 Jane Lomax

Using GO in practice

• However, when you look at the distribution of all genes on the microarray:Process Genes on array # genes expected in occurred

100 random genesmitosis 800/1000 80 80apoptosis 400/1000 40 40p. ctrl. cell prol. 100/1000 10 30 glucose transp. 50/1000 5 20

25th June 2007 Jane Lomax

Enrichment tools

• GO is developing its own enrichment tool as part of the GO browser AmiGO

• Currently in testing phase, should be released next month

25th June 2007 Jane Lomax

Onto-Express walkthrough

http://vortex.cs.wayne.edu/projects.htm#Onto-Express

Recommended