Upload
beverly-simmons
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
CSCE555 BioinformaticsCSCE555 BioinformaticsLecture 16 Identifying Differentially
Expressed Genes from microarray data
Meeting: MW 4:00PM-5:15PM SWGN2A21
Instructor: Dr. Jianjun Hu
Course page: http://www.scigen.org/csce555
University of South CarolinaDepartment of Computer Science and Engineering
2008 www.cse.sc.edu.
OutlineOutline
The problem: identifying Diff Expressed Genes
Statistic Methods: t-testNon-parametric: Rank productSummary
04/21/23 2
The Biological Problem: Identify The Biological Problem: Identify Differentially Expressed GenesDifferentially Expressed Genes
3
No treatment TreatmentWhich pathways will be affected?
Which genes are involved?
Identify differentially expressed Identify differentially expressed genesgenes
One of the core goals of microarray data analysis is to identify which of the genes show good evidence of being DE. This goal has two parts.
1. The first is select a statistic which will rank the genes in order of evidence for differential
expression, from strongest to weakest evidence.
2. The second is to choose a critical-value for the ranking statistic above which any value is
considered to be significant.
k-fold changek-fold change1. measure of differential expression by the ratio of
expression levels between two samples
2. genes with ratios above a fixed cut-off k that is, those whose expression underwent a k-fold change, were said to be differentially expressed
3. this test is not a statistical test, and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed
k-fold changek-fold change
4. replication is essential in experimental design because it allows an estimate of variability
5. ability to assess such variability allows identification of biologically reproducible changes in gene expression levels
Standard statistical testsStandard statistical tests1. More typically, researchers now rely on
variants of common statistical tests.2. These generally involve two parts:
calculating a test statistic and determining the significance of the observed statistic.
3. A standard statistical test for detecting significant change between repeated measurements of a variable in two groups is the t-test;
4. this can be generalized to multiple groups via the ANOVA F statistic.
Standard statistical testsStandard statistical tests
1. For most practical cases, computing a standard t or F statistic is appropriate, although referring to the t or F distributions to determine significance is often not.
2. The main hazard in using such methods occurs when there are too few replicates to obtain an accurate estimate of experimental variances. In such cases, modeling methods that use pooled variance estimates may be helpful.
Standard statistical testsStandard statistical tests1. Regardless of the test statistic used, one must
determine its significance
2. Standard interpretations of t-like tests assume that the data are sampled from normal populations with equal variances
3. Expression data may fail to satisfy either or both of these constraints
Standard statistical testsStandard statistical tests1.use of non-parametric rank-based statistics is also
common, via both traditional statistical methods and
2.ad hoc ones designed specifically for microarray data
RankProd : a non-parametric method to detect RankProd : a non-parametric method to detect differentially regulated genes in replicated differentially regulated genes in replicated experimentsexperiments
(1) originates from an analysis of biological reasoning , easy to understand (2) fast, simple and robust to outliers (suitable for noisy data ) (3) provides statistical significance for each gene and allows for the control
of the overall significance (e.g., false discovery rate) (4) provides straightforward way for cross-platform meta-analysis
(integrates data generated at different laboratories/under different environments into one study, and achieves increased power)
• What does it do? What is the method implemented in the packageRankProd utilizes the so called rank product non-parametric method (Breitling et al., 2004 ) to identify up-regulated or down-regulated genes under one condition against another condition.Rank Product is a non-parametric statistic which detects items that are consistently highly ranked in a number of lists, for example genes that are consistently found among the most strongly unregulated genes in a number of replicate experiments.
• How does it compare to other methods for similar purpose
Rank ProductRank ProductCalculate RP:
Calculate significance
Permutation tests for calulating Permutation tests for calulating significance levelssignificance levels
Permutation tests, generally carried out by repeatedly scrambling the samples’ class labels and computing t statistics for all genes in the scrambled data, best capture the unknown structure of the data.
Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 98, 5116-5121 (2001).
Golub, T.R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-537 (1999).
Dudoit, S., Yang, Y.-H., Callow, M.J. & Speed, T.P. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Technical Report 578 (Department of Statistics, University of California at Berkeley, Berkeley, CA, 2000).
SummarySummaryThe problem: Identify
Differentially expressed genes from Microarray data
How to identify: t-test and Rank product
How to evaluate significance of identified genes