30
BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014

BIOS6660 shRNAseq Gene Set Enrichment Analysis

Embed Size (px)

DESCRIPTION

BIOS6660 shRNAseq Gene Set Enrichment Analysis. Tzu L Phang PhD Robert Stearman PhD April 16, 2014. Stearman Assessment. Genome annotation used was mm9 (from 7/2007). There ’ s more recent annotation mm10 (12/2011). Was the chr19.fa sequence file derived from mm9 or mm10? - PowerPoint PPT Presentation

Citation preview

Page 1: BIOS6660 shRNAseq Gene Set Enrichment Analysis

BIOS6660 shRNAseqGene Set Enrichment

Analysis

Tzu L Phang PhD

Robert Stearman PhD

April 16, 2014

Page 2: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Stearman Assessment1. Genome annotation used was mm9 (from 7/2007). There’s more recent annotation

mm10 (12/2011). Was the chr19.fa sequence file derived from mm9 or mm10?

2. It would be nice to show a mapping table that included:

Chr19 reads 100%

Reads mapped to exons X%

Reads not mapping to exons Y%

One report I think had the reads mapped to exons numbers but didn’t do anything with it.

Of course, what does RNA-seq reads not mapping to exons mean?

3. Four methods used to get overlap gene list. No one had a final table that summarized the values and ranges of fold-change and adjusted p-values. (Some of the FC values were inverse others so this needs to be consistent). No one considered a 3 out of 4 overlap list. No one had a heatmap of the summarized expression values.

4. No one had a supervised cluster based on the overlap genes.

5. Not everyone had R session recorded for reproducible research.

6. Most had a final heatmap of the TOP 15 genes from just the edgeglm method rather than the 14 gene overlap.

7. Several people didn’t get the message about the spliceAlignment argument needed to be on to maximize the reads mapped (~60% vs 90%).

8. If QC reports run and graphs shown but not much in way of interpretation.

9. Some of the heatmap coloring schemes were hard to read and non-standard.

10. Not everyone included a workflow chart to show the analysis path.

Page 3: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Tzu Assessment

• PDF can not be “knitr”

• Try to give more description of what you observe on plot result …

Page 4: BIOS6660 shRNAseq Gene Set Enrichment Analysis
Page 5: BIOS6660 shRNAseq Gene Set Enrichment Analysis
Page 6: BIOS6660 shRNAseq Gene Set Enrichment Analysis
Page 7: BIOS6660 shRNAseq Gene Set Enrichment Analysis
Page 8: BIOS6660 shRNAseq Gene Set Enrichment Analysis

GSEA

Page 9: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Problems with the SuperStar approach

Case 1: No significant genes; because the relevant biological differences are modest relative to the noise inherent to the microarray technology

Case 2: Too many significant genes; difficult to interpret and ad hoc approach depends on biologist’s area of expertise

Case 3: Single-gene analysis may miss important effects on pathways which normally comprised of sets of genes acting in concert

Case 4: Gene lists produced from different labs seldom shown concordances.

Page 10: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Gene Set Enrichment Analysis (GSEA)

Considers an a priori defined GeneSet (e.g., members of a metabolic pathway), and determines where these members are significantly over-represented or enriched at the top (or bottom) of a list of markers ranked by the degree of correlation with a specific phenotype or class distinction

Page 11: BIOS6660 shRNAseq Gene Set Enrichment Analysis

The rows represent the samples or chips, and the columns represent the genes

Samples

Genes

Page 12: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Genes on the left side are highly expressed on the top half (indicated by red color) and lowly expressed on the bottom half (indicated by blue color). The reverse is shown on the right-most genes

Created a gradient or ranked list corresponding to the degree of correlation with the two phenotypes

Diseased

Normal

Highly expressed in diseased

Lowly expressed in diseased

Page 13: BIOS6660 shRNAseq Gene Set Enrichment Analysis

This is depicted nicely by the graph on the bottom of the figure, where the positive ranks on the left represent the correlation to the Disease phenotype and the negative ranks on the right signify the correlation to the Normal phenotype

The graph also generates a rank gradient that represents the order of the most up-regulated genes for the Disease sample on the left-most, and the most up-regulated genes for the Normal samples on the right-most

Diseased

Normal

Page 14: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Now, let’s hide the heatmap and replace the middle part of the figure with genes from a specific geneset, say genes from the Glycolysis pathway.

Each vertical blue bars represents a gene from the pathway, being mapped on the same location as the whole dataset

Again, genes that are located on the left side are highly expressed on the Disease samples, and the opposite is true for the right-most genes

Page 15: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Now, we are ready to demonstrate the GSEA algorithm. 

The walk down algorithm basically scans the ranked gene list L, and when a member of S is encountered, an Enrichment Score (ES) is registered. This is illustrated on the top part of the figure below; when the ES started to build upon encountering more genes from the GeneSet S.

Page 16: BIOS6660 shRNAseq Gene Set Enrichment Analysis

The more S genes is found, the higher the ES

Page 17: BIOS6660 shRNAseq Gene Set Enrichment Analysis

But, when no S genes were encountered for a long walk down,  as indicated on the middle section of the middle plot, the ES will decrease accordingly.  In other words, a high ES relies intimately with the clustering of S genes in close proximity.  In this example, we would conclude that the S genes have high degree of correlation with the Disease phenotype since most of the ES was gained from the left portion of the plot

Page 18: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Advantages of GSEA• Agnostic to the type of gene set and the source

of annotation• Operates on any ordered gene list• Does not require the choice of a gene selection

threshold or the explicit definition of a statistically significant marker set

• Uses distribution-free, non-parametric, permutation-based test procedures with increased statistical power

• Incorporates the permutation of phenotype labels thereby preserving the “biological” correlation structure of the markers

• Takes into account multiple hypotheses testing of multiple gene sets

Page 19: BIOS6660 shRNAseq Gene Set Enrichment Analysis

References

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. & Mesirov, J. P. (2005) Proc. Natl. Acad. Sci. USA 102, 15545-15550. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Page 20: BIOS6660 shRNAseq Gene Set Enrichment Analysis

GSEA Broad Institute (MIT)

http://www.broadinstitute.org/gsea/index.jsp

Page 21: BIOS6660 shRNAseq Gene Set Enrichment Analysis

GSEA download

Page 22: BIOS6660 shRNAseq Gene Set Enrichment Analysis
Page 23: BIOS6660 shRNAseq Gene Set Enrichment Analysis
Page 24: BIOS6660 shRNAseq Gene Set Enrichment Analysis

BioC: gage package• BIOS6660_Share/Week12_13_shRNAseq

– Week12_13_shRNAseq_Day2.R– Gage.pdf– We will be using built in dataset.

• Direct Download

Page 25: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Now, a DemoNow, a Demo

Page 26: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Mark’s data

• BIOS6660_Share/Week12_13_shRNAseq– cep701_AllshRNA_readCounts.txt– Jihye_shRNA_lib_ALL_new.txt

Page 27: BIOS6660 shRNAseq Gene Set Enrichment Analysis

cep701_AllshRNA_readCounts.txt

Page 28: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Jihye_shRNA_lib_ALL_new.txt

Page 29: BIOS6660 shRNAseq Gene Set Enrichment Analysis
Page 30: BIOS6660 shRNAseq Gene Set Enrichment Analysis

Convert Symbol to Entrez