59
25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

Embed Size (px)

Citation preview

Page 1: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Using the Gene Ontology (GO) for analysis of

expression data

Jane LomaxEMBL-EBI

Page 2: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

What is the Gene Ontology?

• Set of standard biological phrases (terms) which are applied to genes/proteins:– protein kinase– apoptosis– membrane

Page 3: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

What is the Gene Ontology?

• Genes are linked, or associated, with GO terms by trained curators at genome databases– known as ‘gene associations’ or GO

annotations

• Some GO annotations created automatically

Page 4: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

gene -> GO term

associated genes

GO annotations

GO database

genome and protein databases

Page 5: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

What is the Gene Ontology?

• Allows biologists to make queries across large numbers of genes without researching each one individually

Page 6: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

Copyright ©1998 by the National Academy of Sciences

Eisen, Michael B. et al. (1998) Proc. Natl. Acad. Sci. USA 95, 14863-14868

Page 7: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO structure

• GO isn’t just a flat list of biological terms

• terms are related within a hierarchy

Page 8: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO structure

gene A

Page 9: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO structure

• This means genes can be grouped according to user-defined levels

• Allows broad overview of gene set or genome

Page 10: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

How does GO work?

• GO is species independent– some terms, especially lower-level,

detailed terms may be specific to a certain group• e.g. photosynthesis

– But when collapsed up to the higher levels, terms are not dependent on species

Page 11: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

How does GO work?

• What does the gene product do?• Where and does it act?• Why does it perform these activities?

What information might we want to capture about a gene product?

Page 12: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO structure

• GO terms divided into three parts:– cellular component– molecular function– biological process

Page 13: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Cellular Component

• where a gene product acts

Page 14: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Cellular Component

Page 15: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Cellular Component

Page 16: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Cellular Component

• Enzyme complexes in the component ontology refer to places, not activities.

Page 17: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Molecular Function

• activities or “jobs” of a gene product

glucose-6-phosphate isomerase activity

Page 18: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Molecular Function

insulin bindinginsulin receptor activity

Page 19: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Molecular Function

drug transporter activity

Page 20: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Molecular Function

• A gene product may have several functions

• Sets of functions make up a biological process.

Page 21: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Biological Process

a commonly recognized series of events

cell division

Page 22: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Biological Process

transcription

Page 23: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Biological Process

regulation of gluconeogenesis

Page 24: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Biological Process

limb development

Page 25: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Biological Process

courtship behavior

Page 26: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Ontology Structure

• Terms are linked by two relationships– is-a – part-of

Page 27: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Ontology Structure

cell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

is-apart-of

Page 28: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Ontology Structure

• Ontologies are structured as a hierarchical directed acyclic graph (DAG)

• Terms can have more than one parent and zero, one or more children

Page 29: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Ontology Structure

cell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

Directed Acyclic Graph (DAG) - multiple

parentage allowed

Page 30: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Anatomy of a GO term

id: GO:0006094name: gluconeogenesisnamespace: processdef: The formation of glucose fromnoncarbohydrate precursors, such aspyruvate, amino acids and glycerol.[http://cancerweb.ncl.ac.uk/omd/index.html]exact_synonym: glucose biosynthesisxref_analog: MetaCyc:GLUCONEO-PWYis_a: GO:0006006is_a: GO:0006092

unique GO IDterm name

definition

synonymdatabase ref

parentage

ontology

Page 31: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO terms

• Where do GO terms come from?– GO terms are added by editors at EBI and

annotating databases– new terms are usually only added when they

are asked for by annotators– GO editors work with experts to make major

ontology developments• metabolism• pathogenesis• cell cycle

Page 32: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO stats

• over 23,000 GO terms:– 13593 biological_process– 1980 cellular_component– 7700 molecular_function

Page 33: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO annotations

• Where do the links between genes and GO terms come from?

Page 34: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO annotations• Contributing databases:

– Berkeley Drosophila Genome Project (BDGP)– dictyBase (Dictyostelium discoideum) – FlyBase (Drosophila melanogaster) – GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum,

Leishmania major and Trypanosoma brucei) – UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro

databases – Gramene (grains, including rice, Oryza) – Mouse Genome Database (MGD) and Gene Expression Database (GXD)

(Mus musculus) – Rat Genome Database (RGD) (Rattus norvegicus)– Reactome– Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) – The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) – The Institute for Genomic Research (TIGR): databases on several

bacterial species – WormBase (Caenorhabditis elegans) – Zebrafish Information Network (ZFIN): (Danio rerio)

Page 35: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Species coverage

• All major eukaryotic model organism species

• Human via GOA group at UniProt• Several bacterial and parasite

species through TIGR and GeneDB at Sanger– many more in pipeline

Page 36: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Annotation coverage

Page 37: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Anatomy of a GO annotation

• Three key parts:– gene name/id

– GO term(s)

– evidence for association

Page 38: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Example annotation

• Breast cancer type 1 susceptibility protein gene in humans

Page 39: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Types of GO annotation:

Electronic Annotation

Manual Annotation

Page 40: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Manual annotation

• Created by scientific curators

• High quality

• Small number

Page 41: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Manual annotation

In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…

Page 42: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Manual annotation

Page 43: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Electronic Annotation

• Annotation derived without human validation– mappings file e.g. interpro2go, ec2go.– Blast search ‘hits’

• Lower ‘quality’ than manual codes

Page 44: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Mappings files

Fatty acid biosynthesis ( Swiss-Prot Keyword)

EC:6.4.1.2 (EC number)

IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)

GO:Fatty acid biosynthesis

(GO:0006633)

GO:acetyl-CoA carboxylase activity

(GO:0003989)

GO:acetyl-CoA carboxylaseactivity

(GO:0003989)

Page 45: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Evidence types

• ISS: Inferred from Sequence/structural Similarity• IDA: Inferred from Direct Assay• IPI: Inferred from Physical Interaction• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern• TAS: Traceable Author Statement• NAS: Non-traceable Author Statement• IC: Inferred by Curator• ND: No Data available

• IEA: Inferred from electronic annotation

Page 46: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO tools

• GO resources are freely available to anyone to use without restriction– Includes the ontologies, gene

associations and tools developed by GO

• Other groups have used GO to create tools for many purposes:

http://www.geneontology.org/GO.tools

Page 47: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO tools

• Affymetrix also provide a Gene Ontology Mining Tool as part of their NetAffx™ Analysis Center which returns GO terms for probe sets

Page 48: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO tools

• Many tools exist that use GO to find common biological functions from a list of genes:

http://www.geneontology.org/GO.tools.microarray.shtml

Page 49: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO tools

• Most of these tools work in a similar way:– input a gene list and a subset of

‘interesting’ genes– tool shows which GO categories have most

interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes

– tool provides a statistical measure to determine whether enrichment is significant

Page 50: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Microarray process

• Treat samples• Collect mRNA• Label• Hybridize• Scan• Normalize• Select differentially regulated genes • Understand the biological phenomena

involved

Page 51: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Traditional analysis

Gene 1ApoptosisCell-cell signalingProtein phosphorylationMitosis…

Gene 2Growth controlMitosisOncogenesisProtein phosphorylation…

Gene 3Growth controlMitosisOncogenesisProtein phosphorylation…

Gene 4Nervous systemPregnancyOncogenesisMitosis…

Gene 100Positive ctrl. of cell prolifMitosisOncogenesisGlucose transport…

Page 52: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Traditional analysis

• gene by gene basis

• requires literature searching

• time-consuming

Page 53: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Using GO annotations

• But by using GO annotations, this work has already been done for you!

GO:0006915 : apoptosis

Page 54: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Grouping by process

ApoptosisGene 1Gene 53

MitosisGene 2Gene 5Gene45Gene 7Gene 35…

Positive ctrl. of cell prolif.Gene 7Gene 3Gene 12…

GrowthGene 5Gene 2Gene 6…

Glucose transportGene 7Gene 3Gene 6…

Page 55: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

GO for microarray analysis

• Annotations give ‘function’ label to genes

• Ask meaningful questions of microarray data e.g.– genes involved in the same process,

same/different expression patterns?

Page 56: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Using GO in practice

• statistical measure – how likely your differentially regulated

genes fall into that category by chance

microarray

1000 genesexperiment

100 genes differentially regualted

mitosis – 80/100apoptosis – 40/100p. ctrl. cell prol. – 30/100glucose transp. – 20/100

0

10

20

30

40

50

60

70

80

mitosis apoptosis positive control ofcell proliferation

glucose transport

Page 57: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Using GO in practice

• However, when you look at the distribution of all genes on the microarray:Process Genes on array # genes expected in occurred

100 random genesmitosis 800/1000 80 80apoptosis 400/1000 40 40p. ctrl. cell prol. 100/1000 10 30 glucose transp. 50/1000 5 20

Page 58: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Enrichment tools

• GO is developing its own enrichment tool as part of the GO browser AmiGO

• Currently in testing phase, should be released next month

Page 59: 25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI

25th June 2007 Jane Lomax

Onto-Express walkthrough

http://vortex.cs.wayne.edu/projects.htm#Onto-Express