30
Dmitry Grapov, PhD Gene Ontology Network Enrichment Analysis

Gene Ontology Enrichment Network Analysis -Tutorial

Embed Size (px)

DESCRIPTION

Step by step tutorial for conducting GO enrichment analysis and then creating a network from the results. Material from the UC Davis 2014 Proteomics Workshop. See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/

Citation preview

Page 1: Gene Ontology Enrichment Network Analysis -Tutorial

Dmitry Grapov, PhD

Gene Ontology Network Enrichment Analysis

Page 3: Gene Ontology Enrichment Network Analysis -Tutorial

• decrease• increase

Use functional analysis to identify if the changes in variables are enriched (increased compared to random chance) for some biological pathway, domain or ontological category.

Page 4: Gene Ontology Enrichment Network Analysis -Tutorial

Enrichment or Overrepresentation analysis

Biochemical Pathway Biochemical Ontology

Page 5: Gene Ontology Enrichment Network Analysis -Tutorial

Major TasksUsing the proteins listed in the excel workbook: ‘proteomic data for

analysis.xlsx’ and worksheet: ‘protein IDs’

1. Conduct Gene Ontology (GO) Enrichment Analysis using DAVID Bioinformatics Resourceshttp://david.abcc.ncifcrf.gov/home.jsp

2. Investigate enriched terms using Quick GO http://www.ebi.ac.uk/QuickGO/

3. Summaries and visualize the results using REVIGO http://revigo.irb.hr/

4. Create and modify GO network using Cytoscape http://www.cytoscape.org/

Page 6: Gene Ontology Enrichment Network Analysis -Tutorial

Protein IDsCommon protein identifier UniProt/SwissProt Accession (default in scaffold) http://www.uniprot.org/

Use Biomart to translate to other database IDS

http://www.biomart.org/

e.g. gene symbols

Page 7: Gene Ontology Enrichment Network Analysis -Tutorial

David Bioinformatics Resources

Page 8: Gene Ontology Enrichment Network Analysis -Tutorial

David Bioinformatics Resources

1. Upload list

2. Choose ID type

3. Select list type

4. Submit

Page 9: Gene Ontology Enrichment Network Analysis -Tutorial

David Bioinformatics Resourcesorganism Make sure all IDs were recognized

List of biochemical databases tested for enrichment

Page 10: Gene Ontology Enrichment Network Analysis -Tutorial

David Bioinformatics Resources

List of biochemical databases tested for enrichment

1. Choose GO

Page 11: Gene Ontology Enrichment Network Analysis -Tutorial

David Bioinformatics Resources

http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E3

Page 12: Gene Ontology Enrichment Network Analysis -Tutorial

David Bioinformatics Resources

List of biochemical databases tested for enrichment

1. Overview BP: Biological process

2. Select

Page 13: Gene Ontology Enrichment Network Analysis -Tutorial

David Bioinformatics Resources

http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E3

Page 14: Gene Ontology Enrichment Network Analysis -Tutorial

David Bioinformatics Resources1. Overview most enriched term

Page 15: Gene Ontology Enrichment Network Analysis -Tutorial

Quick GO http://www.ebi.ac.uk/QuickGO/1. View children (lower hierarchy subsets) of this term

Page 16: Gene Ontology Enrichment Network Analysis -Tutorial

David Bioinformatics Resources/Quick GO1. Can you identify any enriched children of this term in our DAVID output?

?

2. Download results

Page 17: Gene Ontology Enrichment Network Analysis -Tutorial

Overview and Format Results in Excel

1. Save results 2. Open in MS Excel

Page 18: Gene Ontology Enrichment Network Analysis -Tutorial

Overview Results

Modified Fisher’s Exact Test p-value

optionally: Check in Rx<-data.frame(user=c(1,47),genome=c(690,13528))

fisher.test(x) # p-value = 5.41e-06

(13/47) / (690/13528)

Page 19: Gene Ontology Enrichment Network Analysis -Tutorial

Alternative to Fisher Exact Test:

Hypergeometric Test

How to calculate statistics to determine enrichment?

hit.num = 51 # number of significantly changed pathway variables

set.num = 1455 # number of variables in pathway

full = 3358 # all possible variables in organism

q.size = 72 # number of significantly changed variables

phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)

enrichment p-value = 1.717553e-06

Page 20: Gene Ontology Enrichment Network Analysis -Tutorial

Visualization OptionsChallenges: •Removal of redundant information•Visualizing term relationships (term-term, term-protein)

Page 21: Gene Ontology Enrichment Network Analysis -Tutorial

Use REVIGO to filter redundant termshttp://revigo.irb.hr/

prepare input (term, p-value)

1. Upload to

REVIGO

Supek F, Bošnjak M, Škunca N, Šmuc T. "REVIGO summarizes and visualizes long lists of Gene Ontology terms" PLoS ONE 2011. doi:10.1371/journal.pone.0021800

2. Run

Page 22: Gene Ontology Enrichment Network Analysis -Tutorial

REVIGO: overview scatterplot

Position defined on similarity (MDS)

Page 23: Gene Ontology Enrichment Network Analysis -Tutorial

REVIGO: overview table

Cluster leaders prioritized based on enrichment p-value

Page 24: Gene Ontology Enrichment Network Analysis -Tutorial

REVIGO: network

• Edges: 3% of the strongest GO term pairwise similarities

• Node size: generality of term (small = specific)

• Node color: p-value

Download network

Page 25: Gene Ontology Enrichment Network Analysis -Tutorial

Cytoscape

1. Open Cytoscape

Import REVIGO network into cytoscape

2

3 4

Page 26: Gene Ontology Enrichment Network Analysis -Tutorial

Cytoscape: set layout and defaults

1. Set layout 3. Set network defaults

2

4 5

Page 27: Gene Ontology Enrichment Network Analysis -Tutorial

Cytoscape: map data to network properties

1. Set Edge width and color 2. Set Node labels, size and color

Page 28: Gene Ontology Enrichment Network Analysis -Tutorial

Cytoscape: overview network components

Download edge information

1

2

3. View in excel

Download node information

1

2

3. View in excel

Page 29: Gene Ontology Enrichment Network Analysis -Tutorial

Bonus: Modify Edge and Node Attributes to show term to protein connections

See file ‘test edge.xlsx’ and ‘test node.xslx, for examples of upload formats

See detailed instructions at http://www.slideshare.net/dgrapov/demonstration-of-network-mapping

Page 30: Gene Ontology Enrichment Network Analysis -Tutorial

See more Statistical and Multivariate Analysis Examples athttp://imdevsoftware.wordpress.com/tutorials/

Questions?

[email protected]

This research was supported in part by NIH 1 U24 DK097154