36
Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona

Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Experimental design for microarrays

Presented by Alex Sánchez and Carmen Ruíz de Villa

Departament d’Estadística.

Universitat de Barcelona

2

Outline

• Introduction

• Design issues in microarray experiments

• Applying experimental design principles

• The choice of experimental layout

• Hints and conclusions

• Acknowledgments & Disclaimer

3

And so said the master…

To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of.

Sir Ronald A. Fisher

Geneticist, Experimentalist, Statistician

Indian Statistical Congress, 1938

4

Why experimental design?

• The objective of experimental design is to make the analysis of the data and the interpretation of the results– As simple and as powerful as possible, – Given the purpose of the experiment and – The constraints of the experimental material.

5

First things first

• Experimental design must – Look backward

• What’s the relevant question to be answered?

– Consider the current situation• Which factors influence the experiment’s result?• Which ones are limiting?• Which ones can be controlled?

– Look ahead• How are the results to be processed and

analyzed?

6Biological verification

and interpretation

Microarray experiment

Experimental design

Image analysis

Normalization

Biological question

TestingEstimation Discrimination

AnalysisClustering

Quality Measurement

Failed

Pass

Design issues for microarray experiments

8

Designing microarray experiments

• The appropriate design of a microarray experiment must consider– Design of the array– Allocation of mRNA samples to the slides

• Both aspects are influenced by different sets of parameters,

• Ultimately the decisions must be guided by the questions that have to be answered

9

10

Some aspects of design (I):Layout of the array

• Which sequences to print?– cDNA’s Selection of cDNA from library

• Riken, NIA, etc.

– Affymetrix PM’s and MM’s • Oligo probes selection (from Operon, Agilent, etc)

– Control probes• What %?. Where should controls be put?

• How many sequences to print?– Duplicate or replicate spots within a slide

Ask an statistician ….

11

Some aspects of design (2):Allocation of samples in the slides

• Types of Samples– Replication: technical vs biological– Pooled vs individual samples– Pooled vs amplification samples

• Different design layout / data analysis:– Scientific aim of the experiment.– Efficiency, Robustness, Extensibility

• Physical limitations (cost) :– Number of slides.– Amount of material.

12

Scientific aims and design choice

• Different studies, have different objectives– to identify differentially expressed genes,

(class comparison)– to search for specific gene-expression patterns

(class discovery)– to identify phenotypic subclasses.

(class prediction)

• So.. They may require different designs– Sometimes only an option available– Sometimes a choice must be made

13

Principles of experimental design

• In order to attain the objectives of experimental design it is usually considered that the following principles should be applied– Randomization– Replication– Local control

• Next slides give some hints on their application in this context

14

Randomization

• Where to print duplicate spots?– Printing them togetherhigh concordance,

higher risk of missing values if problems– Print them at random may be a better choice

• How to assign…– Treatments to individuals?– Samples to arrays

at random

15

Replication

• It’s important– To increase precision – As a formal basis for inferential procedures

• Different types of replicates– Technical

• Duplicate spots• Multiple hybridizations from the same sample

– Biological

• [If I had to replicate all my experiments I could only do half as much, (Bottstein 1999)]

2

var XXn

16

The 3 layers of experimental design

@ Nature reviews & G. Churchill (2002)

2B

2A

2e

17

Replication (1): Duplicate spots

• In general good practice, but…– Good for quality control

• But: Don’t use internal controls to normalize!

– Worse for statistical inference• Highly correlated Not really iid samples• Ideally: spot duplicates at random

• We want to have duplicates How many?– A minimum of 3 is reasonable

• Helps detect outliers• Decreases # of false (+) and false (-) (Lee 2000)

18

Replication (2): Technical replicates

• Goal is not measure biological variability– X: expression level, E(X)=x, Var(X)=x

2

• Good for assessing measurements precision (intra individual variability)– Y: measure of X, Yi=X+I, E(i)=0, Var(i)=2

• Sometimes yields valuable information– If interest is on individual mRNA’s (diagnostic)

19

Replication (3): Biological replicates

• Hybridizations involving mRNA from different extractions (individuals, cell line…)

• Its main usefulness is to assess and account for biological variability– In very homogeneous populations a single

individual is sometimes taken as representative– It relies on strong unverifiable assumptions– We can’t asses error committed

Don’t substitute biological replicates with technical ones!

20

Pooling: To pool or not to pool?• mRNA from different samples combined to formed a

pool. Why?– If each sample doesn’t yield enough mRNA

• But… one can also amplify

– To compensate an excess of variability• But we can’t estimate it when pooling

• Pooling should in general be avoided but…– If goal of study is test for differential expression

• Under certain restrictions may still be used

– If goal of study requires individual’s information • Can’t be used

21

Experimental layout

• Local control = Experimental layout =How are mRNA samples assigned to arrays

• The experimental layout has to be chosen so that the resulting analysis can be done as efficient and robust as possible– Sometimes there is only one reasonable choice– Sometimes several choices are available

22

Case 1: Meaningful biological control (C)Samples: Liver tissue from 4 mice treated by cholesterol modifying

drugs.Question 1: Genes that respond differently between the T and the C.Question 2: Genes that responded similarly across two or more treatments relative to control.

Case 2: Use of universal reference.Samples: Different tumor samples.

Question: To discover tumor subtypes.

Example I: Only one design choice

T2 T3 T4

C

T1 T1

Ref

T2 Tn-1 Tn

23

Example 2: a number of different designs are suitable for use (2)

• Direct comparison between two treatments

24

Dye swap experiment

Sample1 Sample2

Array2

Array1

Sample1 Sample2

Array2Array2

Array1

A

B

A

A

B

25

Repeated dye-swaps

• Useful for reducing technical variation

• Conclusions limited to the samples

Sample1 Sample2Sample1 Sample2

A

A

A

B

26

Replicated dye swap

A1 B1

A2 B2

A1 B1

A2 B2

A BA B

A BA B

• Accounts for biological and technical variation

• Significance may be harder to achieve

• Conclusions apply to population

27

Reference design

• Widely used• Dye effect-confounded

with treatments ( dye swap to avoid)

• Poor efficiency – ½ measures reference

• Path between any 2 samples: short

• Easy to extend

R

V1 V2 V3

28

Loop design

• Efficient alternative to reference design

• Large loops inefficient– Interweave several

loops

V1

V3 V2

A1

A2

A3

29

How can we decide?

• A-optimality: choosee design which minimizes variance of estimates of effects of interest

• A simple example: Direct vs indirect estimates

A BA

BR

Direct Indirect

2 /2 22

average (log (A/B)) log (A / R) – log (B / R )

These calculations assume independence of replicates: the reality is not so simple.

30

How can we reduce variation?

• Replicated Spots

• Multiple Arrays per Sample

• Multiple Samples per Treatment Group

• Pooling

• Increased precision and quality control

• Estimate measurement error

• Estimate Biological Variation

• Reduce Biological Variation

31

About Resources Allocationin a Microarray Experiment

• Measurement error• Technical variance• Biological variance

• Effect of pooling

• Total variance

2e

2A

2B

32

Tips of experimental design

• In this link from WEHI you may find some ideas and a case study of which information may be collected to make an adequate experimental design

http://bioinf.wehi.edu.au/marray/design.html

33

Summary

• Two important issues– Selection of mRNA samples

• Most important: biological replicates• Technical replicates also useful, but different• Try to avoid making a big pool

– Choice of experimental layout• May be guided by the scientific question• Also by efficiency and robustness considerations

– In general direct comparisons better than indirect– Loop design may be preferred to reference designs– Robust variants for each version yield similar conclusion

34

References• Churchill (2002) Fundamentals of cDNA microarray

design. Nature Genetics 32(suppl.2): 490-5 • Draghici, S. (2003) Data analysis tools for microarrays,

Chapman & Hall,• Kerr (2003) Design considerations for effective and

efficient microarray studies. Biometrics, 59, 822-828• Lee, M.L.T. et al. (2000) Importance of replication in

microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. PNAS 97(18) 9834-9839.

• Speed, T. (2003) Statistical analysis of gene expression microarray data, CRC Press, 2003.

• Yang & Speed (2002) Design Issues for cDNA Microarrays. Nature Reviews Genetics 3: 579-588

35

Acknowledgments

• Special thanks to Yee Hwa Yang (UCSF) for allowing me to use some of her materials

• G. Churchill and Kathleen Kerr, for writing their papers and making their slides available

• Sandrine Dudoit & Terry Speed, U.C. Berkeley• M. Carme Ruíz de Villa, U. Barcelona• Sara Marsal, U. Reumatología, HVH Barcelona

36

Disclaimer

• The goal of this presentation is to discuss the contents of the paper indicated in the title

• Copyrighted images have been taken from the corresponding journals or from slide shows found in internet with the only goal to facilitate the discussion

• All merit for them has to be attributed to the authors of the papers or the slide shows and we wish to thank them for making them available