Gene Ontology and overrepresentation analysis · Gene Ontology (GO) Why Gene Ontology? –Produce a...

Preview:

Citation preview

microarray.n

o

Gene Ontology and overrepresentation analysis

Kjell PetersenJ-Express Microarray analysis course

Bergen March 2010

Presentation adapted from Endre Anderssen and Vidar Beisvåg

NMC Trondheim

microarray.n

o

microarray.n

oOverview

• How can ontologies (and pathway) information help us

• What is an ontology?

• The Gene Ontology and how it's structured

• Using the GO as an annotation resource

• How to use GO interactively in J-Express

• Overrepresentation analysis

– In general

– How to perform in J-Express

• Additional things to think about

microarray.n

oSo here you are

• Figure of diff exp

microarray.n

oGene lists

• Long list of differentially expressed genes

• Possibly hundreds of papers describing the functions of the genes

• Misleading names

• Different names in different organisms

• Need a tool to work with groups of genes to helpbiological interpretation

microarray.n

oWhat’s in a name?

• The same name can be used to describe different concepts

• What is a cell?

microarray.n

oCell

microarray.n

oCell

microarray.n

oCell

Image from http://microscopy.fsu.edu

microarray.n

oGene Ontology (GO)

Why Gene Ontology?

– Produce a controlled vocabulary describing aspects of molecular biology, that can be applied to all organisms.

– Facilitate communication between people and organization.

– Improve interoperability between systems.

In essence: terms with full definition and relations between them

microarray.n

oHow does GO work?

What information might we want to capture about a gene product?

• What does the gene product do?

• Why does it perform these activities?

• Where does it act?

microarray.n

oThe Gene Ontology (GO)

– Molecular function:

• Gene product at biochemical level.

– Biological process:

• Cellular events to which the gene product contributes.

– Cellular component:

• Location or complex of gene/protein.

microarray.n

oMolecular Function

• activities or “jobs” of a gene product

Insulin bindinginsulin transport activity

microarray.n

oBiological Process

• a commonly recognized series of events

cell division

microarray.n

oCellular Component

• where a gene product acts

microarray.n

oContent of GO

• Molecular Function 7,309 terms 8,704 terms

• Biological Process 10,041 terms 18,868 terms

• Cellular Component 1,629 terms 2,734 terms

• Total 18,975 terms 30,306 terms

• Obsolete terms: 992 terms 1,434 terms

• As of October 2005 April 2010

microarray.n

oOntology Structure

• Directed acyclic graphs (DAGs)

• Relationships

– “is a”

• a is a type of b

(e.g. truck is a car, or mitochondrion is an organelle)

– Regulates

• Positively regulates

• Negatively regulates

– “part of”

• sub-process of (process)

• physical part of (component)

(e.g. engine is part of a car, or mitochondrion membrane is a part of a mitochondrion)

microarray.n

o

microarray.n

oGO Annotation

• Association between gene product and applicable GO terms

• Provided by member databases. Collaborating databases annotate their gene products (or genes) with GO terms, providing references and indicating what kind of evidence is available to support the annotations.

• Made by manual or automated methods.

• GO Annotation

• Database object: gene or gene product

• GO term ID

• Evidence supporting annotation

• Reference

– publication or computational method

microarray.n

oBrowsing GO in J-Express

microarray.n

oGene Ontology and Microarrays

• Hypothesis: Functionally related, differentially expressed genes should accumulate in the corresponding GO-group.

• Problem: Find a method, which scores accumulation of differential gene expression in a node of the Gene Ontology.

• GO-tools can be important in order to answer questions such as:

– “are genes involved in process P overrepresented among the total of differentially expressed genes in an experiment” or

– “does treatment A induce more genes involved in process P than treatment B?".

microarray.n

o

Overrepresentation of GO terms

• We have a subset of genes– List of differentially expressed genes– List of genes that cluster together

• Which biological processes do these genes take part in?

• Is there an over-representation of the number of genes belonging to a particular biological process, compared to what could be expected?

microarray.n

o

Question

• If we look at the dataset containing all of our genes and see that 10% of these belong to cell cycle. We then do a differentially expressed genes analysis and get a list of genes we believe are significantly changed.

• How many of the genes in the gene list do you expect belong to cell cycle just by chance?

microarray.n

o

microarray.n

o

Setup

• We name our subset of interesting genes for test data

• And the dataset containing all of our genes, the dataset we extracted the interesting genes from and that we want to compare our testdata to, for reference data

Test data

Reference data

microarray.n

o

GO Overrepresentation Analysis

Test data

Reference data

Statistical comparison between the two GO components

microarray.n

oSome gotchas

• P-value not corrected for multiple testing

– Bonferroni correction: multiply by number of terms mapped to your dataset (sum of 3 blue numbers at the 3 top terms)

• Results depend on

– Selected cut-off for “test data” (check for consistency over several cut-offs)

– Version of GO obo and association files (keep track of which used)

• Browse the terms in the “neighbourhood” of high-scoring terms, will often reveal more context.

microarray.n

oNice to know

• Alternative term hierarchies / tools

– GO slim

– Panther DB (www.pantherdb.org)

– DAVID (http://david.abcc.ncifcrf.gov/)

microarray.n

o

• C:\Program Files\Molmine AS\J-Express 2009\resources\GO– \gene_ontology.obo– \goassociations\gene_association.rgd

Recommended