Complex Experimental Design and Simple Data Analysis: A Pharmaceutical Example

Preview:

DESCRIPTION

Complex Experimental Design and Simple Data Analysis: A Pharmaceutical Example. Joseph G Pigeon Villanova University. Introduction. Designs with restricted randomization have multiple error measures Pharmaceutical example where the split plot structure is even more complex - PowerPoint PPT Presentation

Citation preview

Complex Experimental Design and Simple Data Analysis:A Pharmaceutical Example

Joseph G Pigeon

Villanova University

Introduction

• Designs with restricted randomization have multiple error measures

• Pharmaceutical example where the split plot structure is even more complex– Whole plot structure in two dimensions– Correlation structure in two dimensions

• Caveats– Limited understanding of the biology involved– No originality of statistical methods claimed

Split Plot Designs

• Originated in agricultural experiments where– Levels of some factors are applied to whole plots– Levels of other factors are applied to sub plots

• Separate randomizations to whole plots and sub plots– Two types of experimental units– Two types of error measures– Correlation among the observations

Split Plot Designs

• Also common in industrial experiments when– Complete randomization does not occur– Some factor levels may be impractical, inconvenient

or too costly to change

• This restriction on randomization results in some whole plot factors and some sub plot factors

• Data analysis needs to account for this restricted randomization or split plot structure

Split Plot Example

• Consider a paper manufacturer who wants to study– Effects of 3 pulp preparation methods– Effects of 4 temperatures– Response is tensile strength

• Pilot plant is capable of 12 runs per day

• One replicate on each of three days

Split Plot Example

Temperature

Rep Pulp Prep Method

200

225

250

275

1 30 35 37 36 1 2 34 41 38 42 3 29 26 33 36 1 28 32 40 41 2 2 31 36 42 40 3 31 30 32 40 1 31 37 41 40 3 2 35 40 39 44 3 32 34 39 45

Split Plot Example

• Initially, we might consider this to be a 4 x 3 factorial in a randomized block design

• If true, then the order of experimentation within a block should have been completely randomized

• However, this was not feasible; data were not collected this way

Split Plot Example

• Experiment was conducted as follows:– A batch of pulp was produced by one of the three

methods – The batch was divided into four samples – Each sample was cooked at one of the four

temperatures

• Split plot design with– Pulp preparation method as whole plot treatment – Temperature as sub (split) plot treatment

Split Plot Example

Temperature

Rep Pulp Prep Method

200

225

250

275

1 30 35 37 36 1 2 34 41 38 42 3 29 26 33 36 1 28 32 40 41 2 2 31 36 42 40 3 31 30 32 40 1 31 37 41 40 3 2 35 40 39 44 3 32 34 39 45

Split Plot Example Sum of Degrees of Mean

Source of Variation Squares Freedom Square F

Reps (A) 77.55 2 38.78

Prep Method (B) 128.39 2 64.20 7.08

AB (whole plot error) 36.28 4 9.07

Temp (C) 434.08 3 144.69 41.94

AC 20.67 6 3.45

BC 75.17 6 12.53 2.96

ABC (subplot error) 50.83 12 4.24

Total 822.97 35

• Subplot error is less than whole plot error (typical)

Split Plot Example Lessons

• We must carefully consider how the data were collected and incorporate all randomization restrictions into the analysis– Whole plot effects measured against whole

plot error– Sub plot effects measured against sub plot

error

Description of Example – MQPA Assay

• Multivalent Q-PCR based Potency Assay • Used to assign potencies (independently) to

each of five reassortants of a pentavalent vaccine

• Relies on the quantitation of viral nucleic acid generated in 24 hours

• Two major components – Biological component (infection of the standard and

sample viruses)– Biochemical component (quantitative PCR reaction

where PCR = Polymerase Chain Reaction)

Polymerase Chain Reaction (PCR)

Description of Example-Biological Component

• Vero cell maintenance and set up• Serial dilution of known standard and

unknown sample are incubated with trypsin

• Infected in 4 replicate wells of Vero cell monolayers seeded in a 96 well plate

• Infection proceeds for 24 hours and then halted with the addition of a detergent and storage at –70C

Description of Example-Biochemical Component

• Lysate is thawed and diluted • Preparation of a “master mix” • Preparation of Q-PCR plate (master mix +

diluted lysates)• Configuration of the Q-PCR detection system • Potency is determined by parallel line analysis of

standard and test samples• Specific interest is on optimization of the PCR

portion of the assay

PCR Optimization Design

• Discussions with Biologists identified 13 factors – 8 factors associated with preparation of master mix – 5 factors associated with configuration of PCR

detection system (instrument)

• Discussions with Biologists identified 3 responses – Lowest cycle time (range: 1 – 40)– Least variability between replicates– Valid amplification plot (range: 0 – 4)

• Completion of experiments and analysis immediately!

PCR Optimization DesignFactor Current Range Imp

FOR Primer 400 nM 200 – 900 nM 3

REV Primer 400 nM 200 – 900 nM 3

Probe 200 nM 100 – 400 nM 3

DNTP’s 0.30, 0.60 mM 1/2x – 2x 4

MG C12 5.5 mM 3 – 9 mM 1

Tween 0.01% 0.005 – 0.020 % 2

Taq Gold 0.02U/ul 0.01 – 0.04 U/ul 3

MULV Rt 0.25U/ul 0.125 – 0.5 U/ul 5

Annealing Time 1 min 45 – 60 sec 2

Annealing Temp 60 C 55 – 65 C 1

Rt Temp 45 C 40 – 50 C 1

Denaturing Temp 95 C 90 – 97 C 2

Denaturing Time 15 sec 10 – 20 sec 2

PCR Optimization Design Considerations

• Interactions not expected to exist

• Experiments performed in a 96 well plate

• Each plate can accommodate at most 15 master mix combinations– 12 run PB deign for 8 factors

PCR Optimization Design Considerations

• Time constraints imply at most 16 plates (instrument settings)– 25-1 fractional factorial for 5 factors (5 = 1234)

• Concern about using only 12 of 28 combinations – Half of the plates use a 12 run PB design

(123 = 45 = +1) – Half of the plates use the foldover PB design

(123 = 45 = 1)

Plackett-Burman DesignFactors: 8 Replicates: 1 Design: 12Runs: 12 Center pts (total): 0

Data Matrix (randomized)

Run A B C D E F G H 1 - + + + - + + - 2 + + - + - - - + 3 + - + - - - + + 4 - + + - + - - - 5 + + - + + - + - 6 + - + + - + - - 7 - + - - - + + + 8 + - - - + + + - 9 - - + + + - + + 10 - - - - - - - - 11 + + + - + + - + 12 - - - + + + - +

Half Fraction DesignFactors: 5 Base Design: 5, 16 Resolution: VRuns: 16 Replicates: 1 Fraction: 1/2Blocks: none Center pts (total): 0

Design Generators: E = ABCD

Row StdOrder RunOrder A B C D E

1 1 7 -1 -1 -1 -1 1 2 2 8 1 -1 -1 -1 -1 3 3 3 -1 1 -1 -1 -1 4 4 15 1 1 -1 -1 1 5 5 13 -1 -1 1 -1 -1 6 6 9 1 -1 1 -1 1 7 7 10 -1 1 1 -1 1 8 8 6 1 1 1 -1 -1 9 9 16 -1 -1 -1 1 -1 10 10 2 1 -1 -1 1 1 11 11 4 -1 1 -1 1 1 12 12 12 1 1 -1 1 -1 13 13 5 -1 -1 1 1 1 14 14 11 1 -1 1 1 -1 15 15 14 -1 1 1 1 -1 16 16 1 1 1 1 1 1

PCR Optimization Design Layout

• Each represents a 12 run PB design• 16 × 12 = 192 observations

PCR Optimization Design LayoutMaster Mix

1 2 11 12 13 14 23 24

1 X X X X

2 X X X X

Plate

15 X X X X

16 X X X X

PCR Optimization Design LayoutMaster Mix

1 2 11 12 13 14 23 24

1 X X X X

2 X X X X

Plate

15 X X X X

16 X X X X

• Whole plot structure in two dimensions

PCR Optimization Results• Biologists provided this summary of the 21 runs with an

amplification plot rating of 4

Factor Number of –1’s Number of +1’s Sig

FOR Primer 11 10

REV Primer 15 6

Probe 7 14

DNTP’s 16 5

MG C12 7 14

Tween 9 12

Taq Gold 0 21 *

MULV Rt 14 7

Annealing Time 13 8

Annealing Temp 11 10

Rt Time 8 13

Denaturing Temp 19 2 *

Denaturing Time 13 8

PCR Optimization Resultsplate Count mm Count mm1 Count mm2 Count mm3 Count mm4 Count 3 3 5 2 -1 11 -1 16 -1 6 -1 16 4 4 6 3 1 10 1 5 1 15 1 5 5 1 8 5 N= 21 N= 21 N= 21 N= 21 7 2 9 3 10 1 14 2 11 2 19 5 12 3 22 1 14 1 N= 21 15 3 16 1 N= 21

mm5 Count mm6 Count mm7 Count mm8 Count instr1 Count -1 7 -1 9 1 21 -1 14 -1 12 1 14 1 12 N= 21 1 7 1 9 N= 21 N= 21 N= 21 N= 21

instr2 Count instr3 Count instr4 Count instr5 Count -1 10 -1 8 -1 19 -1 13 1 11 1 13 1 2 1 8 N= 21 N= 21 N= 21 N= 21

PCR Optimization Analysis Log

• mm7 = 1; instr4 = –1

PCR Optimization Resultsplate Count mm Count mm1 Count mm2 Count mm3 Count mm4 Count 1 4 1 6 -1 31 -1 26 -1 47 -1 28 2 4 2 6 1 32 1 37 1 16 1 35 4 3 3 3 N= 63 N= 63 N= 63 N= 63 5 5 4 4 6 6 7 7 7 5 11 5 8 8 13 5 9 4 15 2 10 6 16 6 11 3 17 5 12 1 18 3 13 3 20 2 14 5 21 4 15 3 22 3 16 3 23 2 N= 63 N= 63

mm5 Count mm6 Count mm7 Count mm8 Count instr1 Count -1 30 -1 31 -1 41 -1 21 -1 34 1 33 1 32 1 22 1 42 1 29 N= 63 N= 63 N= 63 N= 63 N= 63

instr2 Count instr3 Count instr4 Count instr5 Count -1 31 -1 42 -1 26 -1 29 1 32 1 21 1 37 1 34 N= 63 N= 63 N= 63 N= 63

PCR Optimization Analysis Log

• mm7 = 1; instr4 = –1

• mm3 = 1; mm7 = 1; mm8 = –1; instr3 = 1

PCR Optimization ResultsFractional Factorial Fit: ctgm

Estimated Effects and Coefficients for ctgm (coded units)

Term Effect Coef SE Coef T PConstant 33.919 0.3852 88.06 0.000instr1 -1.264 -0.632 0.3852 -1.64 0.103instr2 0.596 0.298 0.3852 0.77 0.440instr3 -2.157 -1.078 0.3852 -2.80 0.006instr4 1.152 0.576 0.3852 1.50 0.137instr5 0.667 0.333 0.3852 0.87 0.388instr1*instr2 0.892 0.446 0.3852 1.16 0.249instr1*instr3 0.424 0.212 0.3852 0.55 0.582instr1*instr4 -0.221 -0.110 0.3852 -0.29 0.775instr1*instr5 -0.276 -0.138 0.3852 -0.36 0.721instr2*instr3 -1.110 -0.555 0.3852 -1.44 0.151instr2*instr4 0.240 0.120 0.3852 0.31 0.756instr2*instr5 1.522 0.761 0.3852 1.98 0.050instr3*instr4 0.484 0.242 0.3852 0.63 0.531instr3*instr5 0.182 0.091 0.3852 0.24 0.814instr4*instr5 0.027 0.014 0.3852 0.04 0.972

PCR Optimization Results

210-1-2-3

1

0

-1

Standardized Effect

Nor

mal

Sco

re

C

BE

Normal Probability Plot of the Standardized Effects(response is ctgm, Alpha = .05)

A: instr1B: instr2C: instr3D: instr4E: instr5

PCR Optimization Results

210

C

BE

A

D

BC

AB

E

B

CD

AC

AE

BD

AD

CE

DE

Pareto Chart of the Standardized Effects(response is ctgm, Alpha = .05)

A: instr1B: instr2C: instr3D: instr4E: instr5

PCR Optimization Results

instr5instr4instr3instr2instr1

35.0

34.5

34.0

33.5

33.0

ctgm

Main Effects Plot (data means) for ctgm

PCR Optimization Results

-1

1

1 1-1-1

35.2

34.2

33.2

instr5

instr2

Mean

Interaction Plot (data means) for ctgm

PCR Optimization Analysis Log

• mm7 = 1; instr4 = -1

• mm3 = 1; mm7 = 1; mm8 = -1; instr3 = 1

• Instr3 = 1; instr2 and instr5 should have opposite signs?

PCR Optimization Results

420-2-4-6-8

2

1

0

-1

-2

Standardized Effect

Nor

mal

Sco

re

G

C

AFBD

H

Normal Probability Plot of the Standardized Effects(response is ctgm, Alpha = .05)

A: mm1B: mm2C: mm3D: mm4E: mm5F: mm6G: mm7H: mm8

PCR Optimization Results

876543210

GCH

AFBDBGBEAC

BAEADAGBC

EAD

ABBFAH

F

Pareto Chart of the Standardized Effects(response is ctgm, Alpha = .05)

A: mm1B: mm2C: mm3D: mm4E: mm5F: mm6G: mm7H: mm8

PCR Optimization Results

Estimated Effects and Coefficients for ctgm (coded units)

Term Effect Coef SE Coef T PConstant 33.947 0.3206 105.90 0.000mm1 -0.304 -0.152 0.3206 -0.47 0.636mm2 0.699 0.350 0.3206 1.09 0.277mm3 -4.070 -2.035 0.3206 -6.35 0.000mm4 0.222 0.111 0.3206 0.35 0.730mm5 -0.341 -0.171 0.3206 -0.53 0.595mm6 -0.027 -0.013 0.3206 -0.04 0.967mm7 -4.525 -2.263 0.3207 -7.06 0.000mm8 2.061 1.030 0.3206 3.21 0.002

PCR Optimization Results

3210-1-2-3-4-5-6-7

1.5

1.0

0.5

0.0

-0.5

-1.0

-1.5

Standardized Effect

Nor

mal

Sco

re

mm7

mm3

mm8

Normal Probability Plot of the Standardized Effects(response is ctgm, Alpha = .05)

PCR Optimization Results

76543210

mm7

mm3

mm8

mm2

mm5

mm1

mm4

mm6

Pareto Chart of the Standardized Effects(response is ctgm, Alpha = .05)

PCR Optimization Results

mm8mm7mm6mm5mm4mm3mm2mm1

1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1

36

35

34

33

32

ctgm

Main Effects Plot (data means) for ctgm

PCR Optimization Analysis Log

• mm7 = 1; instr4 = – 1

• mm3 = 1; mm7 = 1; mm8 = –1; instr3 = 1

• instr3 = 1; instr2 and instr5 should have opposite signs?

• mm3 = 1; mm7 = 1; mm8 = –1

PCR Optimization Results Row plate mm ct1 ct2 ct3 ct4 ctgm well1 well2

1 3 14 26.88 27.33 27.25 27.13 27.15 37.98 40 2 3 19 27.62 28.10 28.02 27.40 27.78 40.00 40 3 4 5 29.20 29.04 29.39 28.70 29.08 40.00 40 4 11 14 27.53 26.97 28.04 27.90 27.61 40.00 40 5 11 19 28.25 28.57 28.64 28.09 28.39 40.00 40 6 12 5 28.13 28.93 28.39 28.51 28.49 40.00 40

Row amprating mm1 mm2 mm3 mm4 mm5 mm6 mm7 mm8 instr1 instr2

1 4 1 1 1 -1 -1 -1 1 -1 1 1 2 4 -1 -1 1 -1 1 1 1 -1 1 1 3 4 -1 1 1 1 -1 1 1 -1 -1 -1 4 4 1 1 1 -1 -1 -1 1 -1 -1 1 5 4 -1 -1 1 -1 1 1 1 -1 -1 1 6 3 -1 1 1 1 -1 1 1 -1 1 -1

Row instr3 instr4 instr5

1 1 -1 -1 2 1 -1 -1 3 1 -1 -1 4 1 -1 1 5 1 -1 1 6 1 -1 1

PCR Optimization Results

instr4instr3mm8mm7mm3

36

35

34

33

32

ctgm

Main Effects Plot (data means) for ctgm

PCR Optimization Results

38

33

2838

33

2838

33

2838

33

2838

33

28

mm3

mm7

mm8

instr3

instr41

-1

1

-1

1

-1

1

-1

1

-1

Interaction Plot (data means) for ctgm

PCR Optimization Summary

• No complex models – all simple analyses

• 5 factors were found to be significant (mm3, mm7, mm8, instr3 and instr4)

• These factors were further studied using response surface experiments

• Scientists seem quite happy with the results of the PCR optimization experiments

Concluding Remarks

• Many industrial experiments do have a split or strip plot structure which means multiple and possibly complex error measures

• Arises from the conduct of an experiment and/or any restrictions on the randomization

• We need to incorporate these considerations into a proper analysis and interpretation of experimental data

Concluding Remarks

• Experimental designs with balance, symmetry and orthogonality permit simple but effective graphical analyses (even with some missing data)

• Much can be learned from simple analyses following suitable experimental design – All models are wrong, but some models are useful– All models are wrong, but some models are more

wrong than others

Recommended