31
Fei Ye, PhD Vanderbilt Center for Quantitative Sciences Department of Biostatistics, Vanderbilt School of Medicine CQS Summer Institute 2014 ical Considerations in High-Throughput RNAi or Identifying Genes Mediating Sensitivity t Chemotherapeutic Drugs

Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Embed Size (px)

DESCRIPTION

Statistical Considerations in High-Throughput RNAi Screens f or Identifying Genes Mediating Sensitivity to Chemotherapeutic Drugs. Fei Ye, PhD Vanderbilt Center for Quantitative Sciences Department of Biostatistics, Vanderbilt School of Medicine CQS Summer Institute 2014. Background. - PowerPoint PPT Presentation

Citation preview

Page 1: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Fei Ye, PhD

Vanderbilt Center for Quantitative SciencesDepartment of Biostatistics, Vanderbilt School of Medicine

CQS Summer Institute 2014

Statistical Considerations in High-Throughput RNAi Screens for Identifying Genes Mediating Sensitivity to

Chemotherapeutic Drugs

Page 2: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Background• RNA interference (RNAi): a valuable tool for modulating gene

expression through the introduction of short interfering RNAs including small interfering RNAs (siRNAs) and short hairpin RNAs (shRNAs).

• RNAi has emerged as a powerful technology to knock down specific genes for functional analysis and for therapeutic purposes, particularly because we now know much more about specific genes involved in disease processes.

• RNAi studies conducted with human tumor cell lines using synthetic siRNAs/shRNAs targeting defined gene families or genomic-wide libraries have identified modulators of drug sensitivity (hits).

• Large-scale systematic RNAi screens aim to test hundreds, or even thousands, of siRNAs/ shRNAs to identify hits.

Page 3: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

What is RNA interference (RNAi)?

Page 4: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Challenges in analyzing high-throughput RNAi data

• To identify efficiently and accurately genes that, when lost, significantly reduce or increase cell viability in response to chemo treatment.

• Reduce false positives and false negatives- in the wet lab: (1) technical and procedural improvements; (2) increase the number of replicate measurements

- in the statistical analysis: (1) control type I error; (2) sufficient power; (3) combined effects of RNAi and chemotheraputic drug.

Page 5: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

RNAi screening approach identifies genes that increase sensitivity to Paclitaxel

in breast cancer cells

Page 6: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Real data analysis - background

• Paclitaxel is a potent anti-microtubule agent used in the treatment of patients with locally advanced and metastatic breast cancer. Despite its wide use, paclitaxel-based chemotherapy results in full response in only a small portion of patients; many patients have an incomplete response or are resistant to treatment.

Page 7: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Real data analysis – material and methods

• Performed a loss-of-function RNAi screen to identify genes that modulate paclitaxel sensitivity.

• Targeted a subset of genes (n=428) frequently found to be ”deregulated” in breast cancers and known to be associated with a targeted pharmacological agent.

• Selected 36 candidate genes by screening, and designed two independent siRNAs for each of these genes to validate in two breast cancer cell lines, MDA-MB-231 and MDA-MB-468.

Page 8: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

shRNA screen

~20,000 cells

transfect w/ a subset of the human genome pGIPZ shRNAmir plasmid library (n=1078)

divide Vehicle control(DMSO)

5 nM paclitaxel

Incubate72-96 h

Page 9: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

reverse- transfect divide Vehicle control(DMSO)

5 nM paclitaxel

siRNA screen

Breast cancer cell linesMDA-MB-231MDA-MB-468

Entirely experiment was performed in triplicate.

Page 10: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Real data analysis – preprocessing and normalization

• Sources of noise.

• Ideally, mixed-effects models should be used to estimate both fixed effects (RNAi and treatment) and random effects (batch, plate, day, etc.).

• Baseline correction.

• Within-plate normalization: data from each plate were normalized to a non-silencing (NS) control, to give a relative measurement of target-gene knockdown effect and to control for the effects of siRNA transfection.

Other approaches: ‘%control’, ‘normalized %inhibition’, Z score, B score...

• Across-plate normalization: make measurements comparable across culture plates by removing systematic plate-to-plate variation.

Approaches: median/mean centering, standardization methods, etc.

Page 11: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

controls shRNA/siRNA

Robustness? - Well-well variation

replicate plates

Within-plate normalization?

Page 12: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

controls shRNA/siRNA

Replicability? - Plate-plate variationreplicate plates

Global normalization?

Page 13: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Reproducibility? - Experimental variation

*Non-targeting siRNA control- does not target any mRNA in genome

%cell growth = siRNA non-targeting siRNA for gene x

ρ_Spearman = 0.79- 0.89

Page 14: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Sensitivity Index• To identify genes that when targeted promote paclitaxel sensitivity or

resistance, we calculated a sensitivity index (SI) score for each shRNA. The SI score accounts for the individual and combined effects of shRNAs and the effect of drug on cell viability.

Effect of the siRNA or shRNA:Rc/Cc

Effect of the drug:Cd/Cc

Expected Combined Effect (ECE):

Rc/Cc X Cd/Cc

Observed Combined Effect:

Rd/Cc

SI= ECE - OCE > 0 = sensitive< 0 = resistant

Range: from -1 to 1.

Rc: untreated RNA; Cc: untreated control;

Cd: drugged control; Rd: drugged RNA. Swanton et al Cancer Cell 11, 498-512 2007

Page 15: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Selection of hits from shRNA screen

• Each gene has 2-11 shRNAs/clones.

• A bootstrap algorithm was used to estimate the variability of the mean SI level for genes with > 3 shRNAs by randomly sampling from all shRNAs of that gene with replacement. The corresponding 95% bootstrap C.I. was calculated for each of these genes.

• The mean SI value was calculated for the genes with ≤ 3 shRNAs. Hits represented by these genes were selected with a more stringent cutoff.

Page 16: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Plot of SI scores for all shRNAs

Page 17: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Paclitaxel sensitivity index for indicated genes from shRNA screen

Page 18: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Top sensitizing gene targets from siRNA screen and the corresponding chemical inhibitors

Page 19: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Statistical approaches

Methods used to find the genes that are sensitive/resistant to the drug include

fold-change

t test, Z-factor and their variants

Wilcoxon rank-sum

Sensitivity Index (SI)

It is unclear whether:

(1) Drug effect, RNAi effect, and the interaction effect are all considered,

(2) Variation among replicates is taken into account in the estimation,

(3) Decision error rates (false-positive and false-negative) are appropriately

controlled.

Page 20: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

A Linear Model with an Interaction Term

Disadvantage of SI: it ignores the variation among replicates.

Assuming normal distribution, we can model cell viability (Y) for each

siRNA with the explanatory variables treatment (x1, yes/no), RNAi (x2,

yes/no), and their interaction term (x1x2):

D1: Y = a + b1*x1 + b2*x2 + b3*x1x2 + err

D2: Y = a’ + b’1*x1 + err

Using the goodness-of-fit test, a Chi-square statistic can be calculated,

based on the difference between the deviance of the reduced model

(with drug effect only) D2 and the residual deviance of the fitted model

D1 with 2 degrees of freedom.

Page 21: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

A simulation study

Number of true hits: Uniform{10, 11, …, 60} out of 900+ genes

The viability measurements of non-hits:

N(µNH, σ2), with σ = c (0.2, 0.4, 0.6, 0.8).

The distribution of true hits with a shifted mean relative to the non-hits:

N(µNH*C, σ2)

C>1 for an antagonizing effect, C<1 for a sensitizing effect.

The parameter D was used to tune the strength of the treatment effect

Parameter K (>1) was defined such that non-silencing control wells have a

distribution with mean µctl =µrna*K, where µrna = µNH / µNH*D.

Page 22: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

FPR & FNR

TNN – TP – FP – FN

FN#True Hits – #Agreed

FP#Claimed Pos– #Agreed

TP#Agreed

Truth

Te

st (S

I)

+

+ –

# claimed positives

# True hits (10~60)

N (900+)

FNR = FN/(TP+FN) = 1-sensitivity

FPR = FP/(FP+TN) = 1-specificity

Page 23: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Power analysis: weak drug effect (low concentration)

low σ, low D, high C high σ, low D, high Cmoderate σ, low D, high C

Page 24: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Power analysis: strong drug effect (high concentration)

low σ, high D, high C moderate σ, high D, high C high σ, high D, high C

WHY?

Page 25: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

In the Case of Skewed Data

• Gamma distributions Ga(r,λ) used instead of Normal.

• The shape (r) and scale (λ) parameters of gamma distributions were calculated by solving µ=rλ and σ2= rλ2.

• The skewness value ( ) is taken to be (0.5, 1, 1.5, 2).2 / r

Page 26: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Skewness =1

Scale=0.2; shape=4 ; skewness =1 (for untreated non-hits)

unskewed skewed

Page 27: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Skewness=1.5

Scale=0.45; shape=1.78; skewness=1.5 (for untreated non-hits)

unskewed skewed

Page 28: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Skewness=2

Scale=0.8; shape=1; skewness=2 (for untreated non-hits)

unskewed skewed

Page 29: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Summary points of skewed data

• Ratio-based methods can be unstable: the SI method is affected by the skewness the most.

• The t-test is affected more by the skewness when there is a small number of replicates.

• LM is quite stable.

• FC: very unstable.

• Do transformation if the data are heavily skewed.

Page 30: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

RecommendationsNumber of Replicates

Noisea Drug effectb siRNA effectc Recommended method(s)

3 Low High High LMModerate High High LMHigh High High LMLow Moderate High LMLow Low High LMModerate Low High LMHigh Low High LMLow High Moderate LMLow High Low LM

6 Low High High LMModerate High High LMHigh High High LMLow Moderate High LMLow Low High LMModerate Low High LMHigh Low High LMLow High Moderate LMLow High Low LM

9 Low High High LMModerate High High LM, SIHigh High High LM, SILow Moderate High LMLow Low High LMModerate Low High LMHigh Low High LMLow High Moderate LMLow High Low LM

12 Low High High SIModerate High High SIHigh High High SILow Moderate High LMLow Low High LMModerate Low High LMHigh Low High LMLow High Moderate LMLow High Low LM

a Noise can be measured by coefficient of variation (CV) or variance-to-mean ratio (VMR). VMR<0.2: low noise, 0.2≤VMR<0.5: moderate noise; VMR≥0.5 high noise.

b Drug effect can be estimated by Cd/Cc.

c RNAi effect can be estimated by Rc/Cc.

Page 31: Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Increase the power of your study

Possible ways to improve the power of your study:

• Choose a proper statistical method that is most powerful for your study.

• Reduce the variability.• Increase #replicates. • Use a relatively lower concentration of the drug if appropriate.

“…Here we show that several of these targets sensitize lung cancer cells to paclitaxel concentrations 1,000-fold lower than otherwise required for a significant response, and we identify mechanistic relationships …” (Whitehurst et al., Nature 2007)