31
DESeq, voom and vst Qiang Kou [email protected] April 28, 2014 Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 1 / 31

DEseq, voom and vst

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: DEseq, voom and vst

DESeq, voom and vst

Qiang Kou

[email protected]

April 28, 2014

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 1 / 31

Page 2: DEseq, voom and vst

Background

Advantages of RNA-seq Compared to Microarray

Detecting novel transcripts and isoforms

High reproducibility, low background

Detection of gene fusions and SNPs

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 2 / 31

Page 3: DEseq, voom and vst

Background

Differential Expression Analysis

Steps

NormalizationDispersion estimationStatistical testing

Methods to be presented

DESeq: negative binomial distribution [1]voom: variance modelling at the observational level [2]vst: variance-stabilizing transformation [1, 3]

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 3 / 31

Page 4: DEseq, voom and vst

Background

Timeline

2002 2004 2006 2008 2010 2012 2014 2016

vst

limm

a

cuffl

inks

DEse

q, edge

R

baySeq

voom

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 4 / 31

Page 5: DEseq, voom and vst

Background

Why different models?

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 5 / 31

Page 6: DEseq, voom and vst

Background

RNA-seq is Discrete

Garber et al. (2011) Nature Methods 8:469-477

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 6 / 31

Page 7: DEseq, voom and vst

Background

Length Normalization

Within sample: gene length

Between samples: library size

RPKM and FPKM

Reads/fragments per kilobase per million mapped readsNormalization for gene length and library size

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 7 / 31

Page 8: DEseq, voom and vst

Background

Different Distribution

0.0

0.2

0.4

0.6

1 2 3 4expression

dens

ity

(a) Microarray

0.0

0.1

0.2

0.3

0.4

−2 0 2 4log10(fpkm)

dens

ity

condition

Untreated

CG8144_RNAi

genes

(b) RNA-seq

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 8 / 31

Page 9: DEseq, voom and vst

Background

Differential Expression as a Function of Transcript Length

0 2000 4000 6000 8000

020

4060

80

Sequencing Data (Sultan)

% D

E

a

0 2000 4000 6000 8000

020

4060

80

Array Data (Sultan)

Transcript length (bp)

% D

E

b

2000 4000 6000 8000 10000

02

46

810

12

Sequencing Data (Cloonan)

Transcript length (bp)

% D

E

c

0 1000 2000 3000 4000 5000 6000 7000

020

4060

80

Sequencing Data (Marioni)

d

1000 3000 5000 7000

020

4060

80

Array Data (Marioni)

Transcript length (bp)

e

1000 2000 3000 4000 5000 6000 7000

020

4060

80

Sequencing Data (Marioni)

f

1000 2000 3000 4000 5000 6000 7000

020

4060

80

Array Data (Marioni)

Transcript length (bp)

g

Oshlack et al. (2009) Biology Direct 4:14

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 9 / 31

Page 10: DEseq, voom and vst

Background

Poisson and Negative Binomial Distribution

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 10 / 31

Page 11: DEseq, voom and vst

Background

Poisson Distribution

Graph from Wikipedia

Pr(X = k) = λke−λ

k!

E (x) = Var(X ) = λ

A list of genes g1, g2, . . . gn

X ∼ Poisson(λ), a random variablerepresenting the number of readsfalling in gi

Likelihood ratio test

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 11 / 31

Page 12: DEseq, voom and vst

Background

Negative Binomial Distribution

Graph from Wikipedia

X ∼ NB(r ; p)

Pr(X = k) = C kk+r−1p

k(1 − p)r

p: probability of success

r : predefined number of failures

X : number of successes until rfailures

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 12 / 31

Page 13: DEseq, voom and vst

Background

DEseq, voom and vst

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 13 / 31

Page 14: DEseq, voom and vst

DEseq, voom and vst

Normalization in DESeq

Assumption

Most genes not expressed differentiallyDifferentially expressed genes divided equally between up- and down-regulation

Steps

Geometric mean of gene’s counts across all samplesDivide gene’s counts by the geometric meanNormalization factor: median of ratios

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 14 / 31

Page 15: DEseq, voom and vst

Model in DESeq

Model in DESeq

Read counts for gene i in sample j follows negative binomial distributionKij ∼ NB(µij , σ

2ij)

Why not Poisson distribution?In RNA-seq, variance is larger than mean

Very difficult to estimate µij and σ2ij

Parameters estimation is the main difference between methods based on NBdistribution

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 15 / 31

Page 16: DEseq, voom and vst

Model in DESeq

Model in DESeq

Count sum for gene i in condition A: a

Count sum for gene i in condition B: b

Sum: κ = a + b

p(a), p(b) and p(a, b)

p-value:

p =

∑i+j=κ,p(i,j)<p(a,b) p(i , j)∑

i+j=κ p(i , j)

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 16 / 31

Page 17: DEseq, voom and vst

Model in DESeq

R code for DESeq

library(DESeq)

DESeq.cds = newCountDataSet(countData = data.sim$counts,

conditions = factor(data.sim$treatment))

DESeq.cds = estimateSizeFactors(DESeq.cds)

DESeq.cds = estimateDispersions(DESeq.cds, fitType = "local")

DESeq.test = nbinomTest(DESeq.cds, "1", "2")

DESeq.pvalues = DESeq.test$pval

DESeq.adjpvalues = p.adjust(DESeq.pvalues, method = "BH")

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 17 / 31

Page 18: DEseq, voom and vst

Model in limma

Model in limma

Linear Models for Microarray Data: lmFit()

Classical t-test: tj =µ1j−µ2j√σ2j ( 1

n1+ 1

n2)

Very hard to get the σ2j from a small sample size

limma: moderated t-test

Use information from other genes

σ2j ∼ Inverse Gamma(α, β)

Empirical Bayesian for parameter estimate: eBayes()

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 18 / 31

Page 19: DEseq, voom and vst

Model in voom

Model in voom

voom: variance modelling at the observational level

Locally weighted regression to get the relation between count and variance

Moderated t-test in limma

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 19 / 31

Page 20: DEseq, voom and vst

Model in voom

Model in voom

4 6 8 10 12 14

0.0

0.2

0.4

0.6

0.8

1.0

Average log2(count size + 0.5)

Sqrt( standard devia

tio

n )

a

4 6 8 10 12 14

Average log2(count size + 0.5)

voom: Mean−variance trend

b

4 6 8 10 12 14

Fitted log2(count size + 0.5)

c1.2

Law et al. Genome Biology 2014, 15:R29

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 20 / 31

Page 21: DEseq, voom and vst

Model in voom

R code for voom

library(limma)

library(DESeq)

group = factor(conditions)

nf = calcNormFactors(data.matrix, method = "TMM")

voom.data = voom(data.matrix, design = model.matrix(~group),

lib.size = colSums(data.matrix) * nf)

voom.data$genes = rownames(data.matrix)

voom.fitlimma = lmFit(voom.data, design = model.matrix(~group))

voom.fitbayes = eBayes(voom.fitlimma)

voom.pvalues = voom.fitbayes$p.value[, 2]

voom.adjpvalues = p.adjust(voom.pvalues, method = "BH")

voom.genes <- data.matrix[which(voom.adjpvalues <=

0.05), ]

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 21 / 31

Page 22: DEseq, voom and vst

Model in vst

Model in vst

Variance-stabilizing transformation

To find a simple function f to create new values y = f (x) that the variabilityof y is not related to mean

A method used in microarray data analysis [4]

Moderated t-test in limma

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 22 / 31

Page 23: DEseq, voom and vst

Model in vst

R code for vst

library(DESeq)

library(limma)

group = factor(conditions)

DESeq.cds = newCountDataSet(countData = data.matrix,

conditions = group)

DESeq.cds = estimateSizeFactors(DESeq.cds)

DESeq.cds = estimateDispersions(DESeq.cds, method = "blind",

fitType = "local")

DESeq.vst = getVarianceStabilizedData(DESeq.cds)

DESeq.vst.fitlimma = lmFit(DESeq.vst, design = model.matrix(~group))

DESeq.vst.fitbayes = eBayes(DESeq.vst.fitlimma)

DESeq.vst.pvalues = DESeq.vst.fitbayes$p.value[, 2]

DESeq.vst.adjpvalues = p.adjust(DESeq.vst.pvalues,

method = "BH")

DESeq.vst.genes <- data.matrix[which(DESeq.vst.adjpvalues <=

0.05), ]

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 23 / 31

Page 24: DEseq, voom and vst

Results from Simulation

AUC Results

0.5

0.6

0.7

0.8

5.0 7.5 10.0 12.5 15.0#sample/condition

AU

C

software

baySeq

DESeq

EBSeq

edgeR

NBPSeq

SAMseq

ShrinkSeq

TSPM.

voom

vst

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 24 / 31

Page 25: DEseq, voom and vst

Results from Simulation

Differential Expression Gene Number

1

10

bayS

eq

DE

Seq

NB

PS

eq

voom vs

t

edge

R

Shr

inkS

eq

TS

PM

EB

Seq

SA

MS

eq

software

valu

e

variable

correct

incorrect

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 25 / 31

Page 26: DEseq, voom and vst

Results from Simulation

Running Time

0

100

200

300

400

500

5.0 7.5 10.0 12.5 15.0#sample/condition

time(

sec)

software

baySeq

DESeq

EBSeq

edgeR

NBPSeq

SAMseq

ShrinkSeq

TSPM

voom

vst

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 26 / 31

Page 27: DEseq, voom and vst

Results from Simulation

Running Time with 15 Samples per Condition

Software AUC Time

edgeR 0.810 0.630DESeq 0.652 48.388NBPSeq 0.767 24.942baySeq 0.495 210.781EBSeq 0.769 12.666TSPM 0.836 7.486SAMseq 0.827 1.801voom 0.835 0.264vst 0.830 0.138ShrinkSeq 0.796 343.260

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 27 / 31

Page 28: DEseq, voom and vst

Results from Simulation

Venn Diagram for Drosophila melanogaster

47

13

11

310

178

17

DESeq voom

vstQiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 28 / 31

Page 29: DEseq, voom and vst

Some Conclusion

Some Conclusion

Each method has many assumptions

Negative binomial model has a relatively better specificity and sensitivity

Good performance of voom and vst in accuracy and time, no differencebetween them

All methods will have better performance with larger sample, however,sample size very limited in practice

Different normalization in cuffdiff: both alternative isoforms and length oftranscripts

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 29 / 31

Page 30: DEseq, voom and vst

Some Conclusion

References

Simon Anders and Wolfgang Huber.

Differential expression analysis for sequence count data.Genome Biology, 11:R106, 2010.

Charity W Law, Yunshun Chen, Wei Shi, and Gordon K Smyth.

Voom: precision weights unlock linear model analysis tools for rna-seq read counts.Genome Biology, 15(2):R29, 2014.

Gordon K Smyth.

Linear models and empirical bayes methods for assessing differential expression in microarrayexperiments.Statistical Applications in Genetics and Molecular Biology, 3:Article 3, 2004.

Blythe P Durbin, Johanna S Hardin, Douglas M Hawkins, and David M Rocke.

A variance-stabilizing transformation for gene-expression microarray data.Bioinformatics, pages S105–S110, 2002.

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 30 / 31

Page 31: DEseq, voom and vst

Thanks

Thanks

Thank you for your time!

Qiang [email protected]

Qiang Kou ([email protected]) DESeq, voom and vst April 28, 2014 31 / 31