28
microarray.no Gene Ontology and overrepresentation analysis Kjell Petersen J-Express Microarray analysis course Bergen March 2010 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim microarray.no

Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

o

Gene Ontology and overrepresentation analysis

Kjell PetersenJ-Express Microarray analysis course

Bergen March 2010

Presentation adapted from Endre Anderssen and Vidar Beisvåg

NMC Trondheim

microarray.n

o

Page 2: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oOverview

• How can ontologies (and pathway) information help us

• What is an ontology?

• The Gene Ontology and how it's structured

• Using the GO as an annotation resource

• How to use GO interactively in J-Express

• Overrepresentation analysis

– In general

– How to perform in J-Express

• Additional things to think about

Page 3: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oSo here you are

• Figure of diff exp

Page 4: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oGene lists

• Long list of differentially expressed genes

• Possibly hundreds of papers describing the functions of the genes

• Misleading names

• Different names in different organisms

• Need a tool to work with groups of genes to helpbiological interpretation

Page 5: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oWhat’s in a name?

• The same name can be used to describe different concepts

• What is a cell?

Page 6: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oCell

Page 7: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oCell

Page 8: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oCell

Image from http://microscopy.fsu.edu

Page 9: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oGene Ontology (GO)

Why Gene Ontology?

– Produce a controlled vocabulary describing aspects of molecular biology, that can be applied to all organisms.

– Facilitate communication between people and organization.

– Improve interoperability between systems.

In essence: terms with full definition and relations between them

Page 10: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oHow does GO work?

What information might we want to capture about a gene product?

• What does the gene product do?

• Why does it perform these activities?

• Where does it act?

Page 11: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oThe Gene Ontology (GO)

– Molecular function:

• Gene product at biochemical level.

– Biological process:

• Cellular events to which the gene product contributes.

– Cellular component:

• Location or complex of gene/protein.

Page 12: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oMolecular Function

• activities or “jobs” of a gene product

Insulin bindinginsulin transport activity

Page 13: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oBiological Process

• a commonly recognized series of events

cell division

Page 14: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oCellular Component

• where a gene product acts

Page 15: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oContent of GO

• Molecular Function 7,309 terms 8,704 terms

• Biological Process 10,041 terms 18,868 terms

• Cellular Component 1,629 terms 2,734 terms

• Total 18,975 terms 30,306 terms

• Obsolete terms: 992 terms 1,434 terms

• As of October 2005 April 2010

Page 16: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oOntology Structure

• Directed acyclic graphs (DAGs)

• Relationships

– “is a”

• a is a type of b

(e.g. truck is a car, or mitochondrion is an organelle)

– Regulates

• Positively regulates

• Negatively regulates

– “part of”

• sub-process of (process)

• physical part of (component)

(e.g. engine is part of a car, or mitochondrion membrane is a part of a mitochondrion)

Page 17: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

o

Page 18: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oGO Annotation

• Association between gene product and applicable GO terms

• Provided by member databases. Collaborating databases annotate their gene products (or genes) with GO terms, providing references and indicating what kind of evidence is available to support the annotations.

• Made by manual or automated methods.

• GO Annotation

• Database object: gene or gene product

• GO term ID

• Evidence supporting annotation

• Reference

– publication or computational method

Page 19: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oBrowsing GO in J-Express

Page 20: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oGene Ontology and Microarrays

• Hypothesis: Functionally related, differentially expressed genes should accumulate in the corresponding GO-group.

• Problem: Find a method, which scores accumulation of differential gene expression in a node of the Gene Ontology.

• GO-tools can be important in order to answer questions such as:

– “are genes involved in process P overrepresented among the total of differentially expressed genes in an experiment” or

– “does treatment A induce more genes involved in process P than treatment B?".

Page 21: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

o

Overrepresentation of GO terms

• We have a subset of genes– List of differentially expressed genes– List of genes that cluster together

• Which biological processes do these genes take part in?

• Is there an over-representation of the number of genes belonging to a particular biological process, compared to what could be expected?

Page 22: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

o

Question

• If we look at the dataset containing all of our genes and see that 10% of these belong to cell cycle. We then do a differentially expressed genes analysis and get a list of genes we believe are significantly changed.

• How many of the genes in the gene list do you expect belong to cell cycle just by chance?

Page 23: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

o

Page 24: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

o

Setup

• We name our subset of interesting genes for test data

• And the dataset containing all of our genes, the dataset we extracted the interesting genes from and that we want to compare our testdata to, for reference data

Test data

Reference data

Page 25: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

o

GO Overrepresentation Analysis

Test data

Reference data

Statistical comparison between the two GO components

Page 26: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oSome gotchas

• P-value not corrected for multiple testing

– Bonferroni correction: multiply by number of terms mapped to your dataset (sum of 3 blue numbers at the 3 top terms)

• Results depend on

– Selected cut-off for “test data” (check for consistency over several cut-offs)

– Version of GO obo and association files (keep track of which used)

• Browse the terms in the “neighbourhood” of high-scoring terms, will often reveal more context.

Page 27: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

oNice to know

• Alternative term hierarchies / tools

– GO slim

– Panther DB (www.pantherdb.org)

– DAVID (http://david.abcc.ncifcrf.gov/)

Page 28: Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a controlled vocabulary describing aspects of molecular biology, that can be applied

microarray.n

o

• C:\Program Files\Molmine AS\J-Express 2009\resources\GO– \gene_ontology.obo– \goassociations\gene_association.rgd