14
CSCE555 Bioinformatics CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu .

CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

Embed Size (px)

Citation preview

Page 1: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

CSCE555 BioinformaticsCSCE555 BioinformaticsLecture 16 Identifying Differentially

Expressed Genes from microarray data

Meeting: MW 4:00PM-5:15PM SWGN2A21

Instructor: Dr. Jianjun Hu

Course page: http://www.scigen.org/csce555

University of South CarolinaDepartment of Computer Science and Engineering

2008 www.cse.sc.edu.

Page 2: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

OutlineOutline

The problem: identifying Diff Expressed Genes

Statistic Methods: t-testNon-parametric: Rank productSummary

04/21/23 2

Page 3: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

The Biological Problem: Identify The Biological Problem: Identify Differentially Expressed GenesDifferentially Expressed Genes

3

No treatment TreatmentWhich pathways will be affected?

Which genes are involved?

Page 4: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

Identify differentially expressed Identify differentially expressed genesgenes

One of the core goals of microarray data analysis is to identify which of the genes show good evidence of being DE. This goal has two parts.

1. The first is select a statistic which will rank the genes in order of evidence for differential

expression, from strongest to weakest evidence.

2. The second is to choose a critical-value for the ranking statistic above which any value is

considered to be significant.

Page 5: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

k-fold changek-fold change1. measure of differential expression by the ratio of

expression levels between two samples

2. genes with ratios above a fixed cut-off k that is, those whose expression underwent a k-fold change, were said to be differentially expressed

3. this test is not a statistical test, and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed

Page 6: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

k-fold changek-fold change

4. replication is essential in experimental design because it allows an estimate of variability

5. ability to assess such variability allows identification of biologically reproducible changes in gene expression levels

Page 7: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

Standard statistical testsStandard statistical tests1. More typically, researchers now rely on

variants of common statistical tests.2. These generally involve two parts:

calculating a test statistic and determining the significance of the observed statistic.

3. A standard statistical test for detecting significant change between repeated measurements of a variable in two groups is the t-test;

4. this can be generalized to multiple groups via the ANOVA F statistic.

Page 8: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

Standard statistical testsStandard statistical tests

1. For most practical cases, computing a standard t or F statistic is appropriate, although referring to the t or F distributions to determine significance is often not.

2. The main hazard in using such methods occurs when there are too few replicates to obtain an accurate estimate of experimental variances. In such cases, modeling methods that use pooled variance estimates may be helpful.

Page 9: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

Standard statistical testsStandard statistical tests1. Regardless of the test statistic used, one must

determine its significance

2. Standard interpretations of t-like tests assume that the data are sampled from normal populations with equal variances

3. Expression data may fail to satisfy either or both of these constraints

Page 10: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

Standard statistical testsStandard statistical tests1.use of non-parametric rank-based statistics is also

common, via both traditional statistical methods and

2.ad hoc ones designed specifically for microarray data

Page 11: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

RankProd : a non-parametric method to detect RankProd : a non-parametric method to detect differentially regulated genes in replicated differentially regulated genes in replicated experimentsexperiments

(1) originates from an analysis of biological reasoning , easy to understand (2) fast, simple and robust to outliers (suitable for noisy data ) (3) provides statistical significance for each gene and allows for the control

of the overall significance (e.g., false discovery rate) (4) provides straightforward way for cross-platform meta-analysis

(integrates data generated at different laboratories/under different environments into one study, and achieves increased power)

• What does it do? What is the method implemented in the packageRankProd utilizes the so called rank product non-parametric method (Breitling et al., 2004 ) to identify up-regulated or down-regulated genes under one condition against another condition.Rank Product is a non-parametric statistic which detects items that are consistently highly ranked in a number of lists, for example genes that are consistently found among the most strongly unregulated genes in a number of replicate experiments.

• How does it compare to other methods for similar purpose

Page 12: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

Rank ProductRank ProductCalculate RP:

Calculate significance

Page 13: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

Permutation tests for calulating Permutation tests for calulating significance levelssignificance levels

Permutation tests, generally carried out by repeatedly scrambling the samples’ class labels and computing t statistics for all genes in the scrambled data, best capture the unknown structure of the data.

Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 98, 5116-5121 (2001).

Golub, T.R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-537 (1999).

Dudoit, S., Yang, Y.-H., Callow, M.J. & Speed, T.P. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Technical Report 578 (Department of Statistics, University of California at Berkeley, Berkeley, CA, 2000).

Page 14: CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun

SummarySummaryThe problem: Identify

Differentially expressed genes from Microarray data

How to identify: t-test and Rank product

How to evaluate significance of identified genes