22
A Practical Guide to Design of Experiments (DOE) for Assay Developers Daniel Joelsson © 2010 Daniel Joelsson All rights reserved

DoE for assay_developers_chp1_rev-1.0

Embed Size (px)

DESCRIPTION

An xlnt write up by Daniel Joelsson on Design of Experiments for Bioassays. Hope one day Daniel shares the whole book. Good job mate.

Citation preview

Page 1: DoE for assay_developers_chp1_rev-1.0

A Practical Guide to

Design of Experiments (DOE)

for Assay Developers

Daniel Joelsson

© 2010 Daniel Joelsson All rights reserved

Page 2: DoE for assay_developers_chp1_rev-1.0

Introduction

Why write this book? Those of you that know me probably aren't surprised. I've been astrong advocate for a more systematic approach to assay development for many years, tothe point of annoyance for many of my colleagues. I feel that using Design of Experiments(DOE) in assay development is the most efficient way to develop quality assays in the leastamount of time possible. Then why aren't scientists using DOE more widely? I think part ofthe problem is the lack of understanding of what DOE really is and why it's useful. I lookedfor a good introduction to DOE for assay development scientists and I couldn't find one. Mostof the texts out there are engineering and process focused. Therefore, I'm writing my ownguide for us scientists and assay developers.

I should state up front that I'm not a statistician, I'm a scientist. This guide is notintended for statisticians and I will not try to turn you into one either. However, somebasic understanding of statistics is important to fully appreciate DOE. I will also makesome assumptions that you are familiar with basic biology and biochemistry techniques.My background is primarily in the design of bioassays and immunoassays for vaccines andbiotherapeutics. Most of my examples will come out of those disciplines.

I will do my best to explain any statistical topics in a relevant and pragmatic way, but in caseyou want more, these are two resources that I have found to be excellent texts on statisticsfor scientists:

The Biostatistics Cookbook - The Most User-Friendly Guide for the Bio/Medical Scientist bySeth Michelson and Timothy Schofield

Biometry: The Principles and Practices of Statistics in Biological Research by Robert Sokaland F. James Rohlf

This book is meant to be a guide for beginning to intermediate DOE users, and slantedcompletely towards scientists that do assay development. If you would like a more in depthdiscussion about DOE and the statistics behind it, Design & Analysis of Experiments byDouglas C. Montgomery is a great resource to get you started.

While I will try to teach you enough about DOE to allow you to design and analyze yourown experiments, a statistician familiar with assay development can be a valuable resource.What this book will give you is an understanding of the concepts and language of DOE so thatyou can easily communicate with your statistics friends. One of my goals is to increase theoverlap in understanding on both sides. You might want to give a copy of this book to yourstatistician as well, especially if they are not familiar with the pitfalls of assay design.

© 2010 Daniel Joelsson All rights reserved

Page 3: DoE for assay_developers_chp1_rev-1.0

Before we start, a few thanks are in order. Thanks to Tim Schofield for schooling me onstatistics from the first day we met. Thanks to Edith Senderak for all of the mentoring andcollaborations over the years. Thanks to Joe Pigeon for sparking my initial interest in DOEas well as introducing me to split-plot designs (which you will see are perfect for developingassays in 96 well plates). Thanks to all of my colleagues for trusting me enough to allow meto help with their DOE designs.

This book is being published under a Creative Commons license. You are free to distributethis work to anyone you think would be interested, free of charge. You may not use anyportion of this book for any commercial purpose without prior permission. You may createderivatives of this work, as long as these derivatives adhere to the same licence restrictions.

For complete license information, please visit:http://creativecommons.org/licenses/by-nc-sa/3.0/

© 2010 Daniel Joelsson All rights reserved

Page 4: DoE for assay_developers_chp1_rev-1.0

Chapter 1 - Screening Designs

1.1 One Factor At a Time (OFAT) and Interactions

Think back to your early years of scientific training for a moment. If your experience wasanything like mine, you were first taught to do experiments using the scientific method. Itwent something like this: first generate a hypothesis based on past knowledge, design anexperiment to test the hypothesis, analyze the data and then refine the hypothesis. Repeatuntil we get to the answer.

This is the correct way to do science. Unfortunately, it is in the second step where we usuallylearn some bad habits. We were told to vary one factor at a time (OFAT) and hold everythingelse constant. At first glance, OFAT makes a lot of sense, but it can be misleading. Let'slook at a simple assay development experiment.

Imagine that you're trying to develop an ELISA to detect a bacterial contaminant in a samplefrom a process development study for a new biotherapeutic. Two of the variables youassume to be important are the amount of capture antibody you add to your plate and thetime you incubate your sample. Consistent with the OFAT approach, you start out by testingtwo different antibody concentrations: 0.2 ug/ml and 1.0 ug/ml while keeping the incubationtime constant at one hour. You then run the assay and faithfully record the results in yourlab notebook (Table 1.1).

Table 1.1

Antibody concentration Incubation time Response (O.D.)

0.2 ug/ml 1 hour 0.31

1.0 ug/ml 1 hour 0.97

Great! Obviously, adding more antibody increases the response (it increased from 0.31 to0.97). Your initial hypothesis that adding more antibody would be beneficial was correct.Good to know! What if you increased it even further? Let's find out what happens (Table1.2).

© 2010 Daniel Joelsson All rights reserved

Page 5: DoE for assay_developers_chp1_rev-1.0

Table 1.2

Antibody concentration Incubation time Response (O.D.)

0.2 ug/ml 1 hour 0.31

1.0 ug/ml 1 hour 0.97

1.5 ug/ml 1 hour 0.95

Ok, it appears that adding even more antibody does not further increase the signal. So youconclude that the response has been saturated somewhere around 1.0 ug/ml.

Since the conditions for the first variable have been optimized, let's look at the secondvariable, incubation time. Again, you do a simple experiment increasing the incubation timeto two hours. Since you already know that 1.0 ug/ml of antibody is ideal, you only need torun one experiment with 1.0 ug/ml of antibody for two hours of incubation. The results areshown in Table 1.3

Table 1.3

Antibody concentration Incubation time Response (O.D.)

0.2 ug/ml 1 hour 0.31

1.0 ug/ml 1 hour 0.97

1.5 ug/ml 1 hour 0.95

1.0 ug/ml 2 hours 1.72

Even better! Clearly two hours is better than one. What if you increase the incubation timeeven further? You try three hours and see what happens (Table 1.4).

Table 1.4

Antibody concentration Incubation time Response (O.D.)

0.2 ug/ml 1 hour 0.31

1.0 ug/ml 1 hour 0.97

1.5 ug/ml 1 hour 0.95

1.0 ug/ml 2 hours 1.72

1.0 ug/ml 3 hours 1.74

Again, increasing the variable did not have a further effect on the response. So you concludethat the optimum settings for this assay is to use 1.0 ug/ml and incubate for two hours.

© 2010 Daniel Joelsson All rights reserved

Page 6: DoE for assay_developers_chp1_rev-1.0

Since you've now tested and optimized both variables, you're done right? Well, maybe not.There is one experiment that was not performed: 0.2 ug/ml of antibody incubated for twohours. Let's run that experiment and see what happens (Table 1.5)

Table 1.5

Antibody concentration Incubation time Response (O.D.)

0.2 ug/ml 1 hour 0.31

1.0 ug/ml 1 hour 0.97

1.5 ug/ml 1 hour 0.95

1.0 ug/ml 2 hours 1.72

1.0 ug/ml 3 hours 1.74

0.2 ug/ml 2 hours 2.62

Whoa! The response is now much higher than what we previously considered optimal.There are now all kinds of experiments we should probably run to make sure we have all theanswers we need. What happens if we use less than 0.2 ug/ml? What about testing otherincubation times with 0.2 ug/ml?

This example highlights one of the greatest weaknesses of OFAT experiments. You arevery likely to miss an interaction between two or more of the variables. An interactionoccurs when the effect of two variables are not completely independent of each other; i.e.the response from one variable is dependent on the level of the other. Since the OFATapproach (and traditional scientific training) assumes that all variables are additive, it doesn'tencourage us to test for interactions. The only way to find them is by luck. Unfortunately,interactions like these happen all the time in the real world.

For scientists trained in linear thinking, interactions can sometimes be hard to visualize.There's a specific graph called an interaction plot that makes it slightly easier (Figures 1.1and 1.2). In an interaction plot where the two variables have no interaction the two responselines will be perfectly parallel (Figure 1.1). The two lines represent the response due toFactor A at the two different levels of Factor B. Since the lines are perfectly parallel, theeffect of A and B is completely additive. When you increase Factor B it shifts the entire FactorA line upwards by the same amount. This is the kind of relationship we assume is always inplace in an OFAT experimental design.

If the lines of the interaction plot intersect (or are simply non-parallel), there's an interactionbetween the two variables (Figure 1.2). Increasing Factor B no longer just moves the FactorA line higher. Instead, at the lower concentration of B, the response in Factor A actuallydecreases from low to high. Clearly the effects of the two variables are not simply additive.This is exactly the situation we observed in the assay development example above.

© 2010 Daniel Joelsson All rights reserved

Page 7: DoE for assay_developers_chp1_rev-1.0

Figure 1.1 - Interaction plot for two factors without an interaction

Figure 1.2 - Interaction plot for two factors with an interaction

1.2 Factorial experiments - foundations of DOE

I hope I convinced you in the previous section that interactions between variables can be animportant factor to the success of your experiments and that the OFAT approach makes ithard to find them unless you're willing to do a lot of work.

This section will introduce some of the basics of Design of Experiments (DOE). DOE is a moresystematic approach to experimentation than the OFAT approach. The goal is to test all ofthe variables in a system in a set of multi-factorial experiments allowing us optimize all ofthem and find interactions at the same time.

The assay development example in the last section made it clear that it would have beena good idea to run all four combinations of the two variables right away. This type ofexperiment is called a factorial experiment. A picture of the four different experiments (or

© 2010 Daniel Joelsson All rights reserved

Page 8: DoE for assay_developers_chp1_rev-1.0

"runs") shows how we start exploring the "design space" of our experimental system (Figure1.3). As you can see, we have covered the four corners of the space.

The runs can also be shown in a table such as Table 1.6. During each run, a factor can takeon a low value (depicted as "-") or a high value (depicted as "+"). With two factors there arefour such combinations possible. By performing all the possible combinations of the factors,we will be able to tell if each of the factors are important (i.e. changing them has an effecton the response), but also if any interactions are present.

Figure 1.3 Pictorial depiction of a 2^2 factorial experiment.

Table 1.6

Run # Factor A Factor B

Run 1 + +

Run 2 - +

Run 3 + -

Run 4 - -

Let's expand the same thinking to three factors (Figure 1.4 and Table 1.7). With three factorswe have to complete eight runs to run every combination of high and low for all three factors.But again, you get a lot of information out of those runs, you will know if any or all of thefactors influence the response, if any two-factor interactions exist (AxB, AxC, BxC), and ifthere's a three-factor interaction (AxBxC).

© 2010 Daniel Joelsson All rights reserved

Page 9: DoE for assay_developers_chp1_rev-1.0

Figure 1.4 Pictorial depiction of a 2^3 factorial experiment

Table 1.7

Run# Factor A Factor B Factor C

1 + + +

2 - + +

3 + - +

4 - - +

5 + + -

6 - + -

7 + - -

8 - - -

This exercise can be continued for four factors and so on. Performing eight runs for threefactors might seem reasonable, but as you may have figured out already, as we add factors,the number of runs increases exponentially (Table 1.8). Since most assay systems containmore than three or four factors, factorial experiments quickly become too large to feasiblyperform. With this in mind, it's easy to see how OFAT is still widely practiced. Factorialexperiments rapidly become unwieldy as the numbers of factors goes up.

© 2010 Daniel Joelsson All rights reserved

Page 10: DoE for assay_developers_chp1_rev-1.0

Table 1.8

Number of factors Number of runs

1 2

2 4

3 8

4 16

5 32

6 64

7 128

8 256

9 512

10 1024

But, since OFAT experiments make it very difficult to find those elusive interactions, we needan alternative approach. Luckily DOE provides that in the form of fractional factorials, thetopic of the next section.

However, before I get to that topic, I want to address a common question. How come weonly use two levels (high and low) for each factor? Obviously, it would be risky to make theassumption that any response is perfectly linear between two points. It might be that theresponse has a complex shape in that space.

In the example in Figure 1.5, the optimum response is actually somewhere between our lowand high settings. This is where DOE differs from traditional experimentation. The goal ofthese factorial experiments is not to optimize the response completely, but to screen for thefew factors that actually affect the response. These factors are then carried forward in a setof optimization experiments.

Not all factors are likely to be important in every system, therefore we should do fairly lowresolution experiments to identify the ones that truly matter. As you will see, screeningexperiments have relative fewer runs per factor than optimization experiments. We canafford to include more factors in the initial experiments to make sure that we don't miss anyof the critical few.

© 2010 Daniel Joelsson All rights reserved

Page 11: DoE for assay_developers_chp1_rev-1.0

Figure 1.5 - Non linear response

1.3 Fractional Factorials

In the last section, we learned that factorial experiments are useful designs to pick up bothmain and interaction effects in our experimental system. But for more than a few factors,they quickly become too large to be feasible. In most assay development experiments ,we can easily identify ten or more factors that could be important (reagent concentrations,incubation temperatures, incubation times etc.). We need a different approach. There is aset of DOE designs called fractional factorials that meet this need nicely.

Let's go back to a simple three factor factorial experiment (Figure 1.6). In this experiment,we have three factors. Since we are doing screening experiments to identify the importantfactors, we will only be testing two levels per factor. Now imagine taking the threedimensional space in Figure 1.6 and condensing it down to two dimensions (Figure 1.7). Ineffect, we are now looking at the front of the cube so that we can't see the different levels offactor C any longer. We can take this thinking one step further and compress the resultingsquare into a single dimension (Figure 1.8).

© 2010 Daniel Joelsson All rights reserved

Page 12: DoE for assay_developers_chp1_rev-1.0

Figure 1.6 Three factor screening experiment.

Figure 1.7 Three factor screening experiment compressed into two dimensions.

Figure 1.8 Three factor screening experiment compressed into one dimension.

When we look at our experimental space this way, something interesting happens. Itbecomes clear that we have actually run four replicates of each of the two levels for factorA. This property of factorial designs is called hidden replication. While each of the eightexperimental runs have a different combination of levels of all three factors, each factor'slevel is actually replicated four times.

© 2010 Daniel Joelsson All rights reserved

Page 13: DoE for assay_developers_chp1_rev-1.0

Note that it was completely arbitrary that we chose factor A for this analysis. The same thinghappens if we compress the design for either factor B or C. This is a useful characteristicsof factorial designs (and fractional factorial designs as we will see in a little bit) calledrotatability. Because the design is perfectly symmetrical, we can in effect assign any factorto be A, B, or C and it doesn't matter. We can get away with this because instead of usingthe actual numbers for the setting for each factor, we are going to use coded numbers of -1or +1. This might seem confusing for now, but bear with me. It will make more sense whenwe discuss how we analyze these designs. What it does for us now is to scale the levels ofeach response to have the same distance, thus making the design rotatable.

There is also a second characteristic of these designs worth noting. The reason we couldcompress the figure down into a single dimension is because of a property calledorthogonality. Notice how the axes for the three factors in figure 6 are at 90 degree anglesto each other? This means that when we compress the design down to a single dimension,the effects of all the other factors cancel out, allowing us to estimate just the effect of the onevariable we want. This is what we mean by a multi-variable experiment. Unlike in an OFATstudy, we truly can vary all of our variables at the same time and still be able to distinguisheffects from each.

Looking for hidden replication in our experimental space showed us that each factor wasactually replicated four times. Since we probably don't need that many replicates to tell ifthe response changes from the low to the high setting, let's eliminate some of these extraruns and see what happens.

One of the possible ways of doing this is shown in figure 1.9. As you can see, we haveeliminated half of the runs. The amazing thing is that we are still doing two replicates of eachlevel for all three factors. With only four runs we can estimate the effects of three variableswith two replicates at each setting. That's a pretty good return for your experimentalinvestment.

© 2010 Daniel Joelsson All rights reserved

Page 14: DoE for assay_developers_chp1_rev-1.0

Figure 1.9 Three factor fractional factorial experiment (half-fraction).

We could try and eliminate another two runs (Figure 1.10), but this time we have pushedthings too far. Now we can't estimate factor C any longer. For three factors we can onlyeliminate the first half of runs and still have a viable designs. However, for designs withmore than three factors we can eliminate more than half of the runs and still estimate all ofthe main effects. In fact, the more factors we have, the more runs we can eliminate.

Figure 1.10 Three factor fractional factorial (quarter fraction).

© 2010 Daniel Joelsson All rights reserved

Page 15: DoE for assay_developers_chp1_rev-1.0

1.4 Statistical Power and Aliasing in Fractional Factorials

You're probably thinking that if we just eliminated half of our work, there has to be a catch.You're right. An immediate impact is that we have lowered the number of replicates per datapoint from four to two. This has the outcome of lowering the statistical power of our design.We would need a larger effect in the response when going from low to high in order to see it.But that might be a trade-off you are willing to take, especially if you only care about large(relative to the experimental system) effects.

How do you know how much of a trade-off you are making? That's a function of theunderlying variability of the system you are looking at. If the variability is large and theeffect you expect is small, you will need more replicates. Power calculations are describedthoroughly in most statistics books (see the introduction to this book for sources), but youprobably won't have to calculate it by hand. All of the specialized DOE software on themarket will do this calculation for you. I'll talk more about these packages when I discussthe analysis of screening designs in the next chapter. For now, let's assume that you haveenough power to estimate the effects you're expecting to see.

The other penalty we take when eliminating runs from our design is to create what is termedaliasing of effects. In the discussion of full factorial experiments, I explained the concept ofan interaction between two factors. A consequence of fractional factorials is the confoundingof interactions with main effects, or with each other.

Let me explain what that means in a little more detail. Let's look back at our three-factorfractional factorial in figure 1.9 again. Recall in our discussion in section 1.1 that if wewanted to estimate an interaction between factors A and B we would have to run all four ofthe "corners" of the square diagram. Unfortunately, in our fractional factorial, two of thosecorners are run at the lower level of factor C and the other two are run at the higher level offactor C. The design is no longer orthogonal with respect to interaction AB and factor C. Thesame is the case for the interactions BC and AC. In the nomenclature of DOE we would saythat the main effect in A is aliased with the interaction BC. The notation we use to describethis relationship is as follows:

A = A + BCB = B + ACC = C + AB

The effect we observe due to factor A is a combination of A and the interaction BC and wecan't tell how much is contributed by each.

© 2010 Daniel Joelsson All rights reserved

Page 16: DoE for assay_developers_chp1_rev-1.0

This relationship can also be described using the notation in table 1.9. Since this is not astatistics book, I will take some liberties with explaining this table. For a more completediscussion please see the book by Montgomery listed in the introduction or any otherstatistics-based DOE book.

Table 1.9 is similar to the run tables in previous sections, with some differences. Insteadof run # in the first column, it is now labeled "treatment combination". I have also addedcolumns for all of the possible interactions and one labeled "I" which stands for identity.While this table may not make a lot of sense right now, there are a few things we can take

away from it.

In chapter 2, we will learn how to build a mathematical model that relates the settings ofthe individual factors to the level of the output response. In that model, each factor will bepreceded by a constant. The value of that constant will be solved by using the + and - signsin each column. For now, look carefully at the table and you will notice that each column hasa unique pattern. We can therefore solve for the constant for each term in the model.

On the bottom half of the table, I have shaded in grey the combinations that we eliminatedin figure 1.9. Let's take a closer look at the remaining pluses and minuses in the top ofthe table. If we said that these sign represent coefficients in an equation for estimating theeffects, then you will notice that the coefficients in column A are the same as those in columnBC (in the top half of the table). The same goes for column B with AC and column C withAB. Column ABC is identical to column I, which means that we are no longer able to estimatea three factor interaction.

Table 1.9

TreatmentCombination

I A B C AB AC BC ABC

a + + - - - + + +

b + - + - - + - +

c + - - + + - - +

abc + + + + + + + +

ab + + + - + - - -

ac + + - + - - - -

bc + - + + - - + -

(1) + - - - + + + -

If you're more comfortable with thinking about aliases graphically or by using the table, theend result is the same. When we reduce the number of runs in a factorial, the number ofaliases increase and with them our ability to estimate some of the effects in the experiment.

© 2010 Daniel Joelsson All rights reserved

Page 17: DoE for assay_developers_chp1_rev-1.0

As you might have noticed, we could just as easily have eliminated the other half of the runsin the experiments. The aliasing structure now takes on a negative relationship (i.e. A = A -BC etc.), but the principle is the same. We can no longer estimate the effects independently.

1.5 Other Screening Designs

While fractional factorial designs are the easiest screening designs to understand due theirsimple aliasing structures, they are not the only designs available if you are looking to identifyimportant factors. In fact, there are whole families of designs that aim to minimize theamount of runs while still being able to identify main effects. I'll discuss one of the morepopular ones in brief detail here, and get to the more advanced designs in Chapter 4.

Plackett-Burman designs are a family of two-level screening designs that allow you to usethe least amount of runs possible for situations where you have 11, 12, 19, 23, 27, and 31factors. In fact, you only need to run one more run than the number of factors you have.These designs have extremely complex aliasing structures. Here's an example of the aliasingstructure for the main effect of factor A in an 11 factor design:

[A] = A - 0.333 * BC - 0.333 * BD - 0.333 * BE + 0.333 * BF - 0.333 * BG- 0.333 * BH + 0.333 * BJ + 0.333 * BK - 0.333 * BL + 0.333 * CD - 0.333 * CE- 0.333 * CF + 0.333 * CG - 0.333 * CH + 0.333 * CJ - 0.333 * CK - 0.333 * CL+ 0.333 * DE + 0.333 * DF - 0.333 * DG - 0.333 * DH - 0.333 * DJ - 0.333 * DK- 0.333 * DL - 0.333 * EF - 0.333 * EG - 0.333 * EH - 0.333 * EJ + 0.333 * EK+ 0.333 * EL - 0.333 * FG + 0.333 * FH - 0.333 * FJ - 0.333 * FK - 0.333 * FL+ 0.333 * GH - 0.333 * GJ + 0.333 * GK - 0.333 * GL - 0.333 * HJ - 0.333 * HK+ 0.333 * HL - 0.333 * JK + 0.333 * JL - 0.333 * KL - 0.333 * BCD + 0.333 * BCE- 0.333 * BCF + 0.333 * BCG + 0.333 * BCH + 0.333 * BCJ + 0.333 * BCK- 0.333 * BCL + 0.333 * BDE + 0.333 * BDF + 0.333 * BDG - 0.333 * BDH- 0.333 * BDJ + 0.333 * BDK + 0.333 * BDL + 0.333 * BEF - 0.333 * BEG+ 0.333 * BEH - 0.333 * BEJ + 0.333 * BEK - 0.333 * BEL - 0.333 * BFG+ 0.333 * BFH + 0.333 * BFJ + 0.333 * BFK + 0.333 * BFL - 0.333 * BGH+ 0.333 * BGJ + 0.333 * BGK + 0.333 * BGL + 0.333 * BHJ - 0.333 * BHK+ 0.333 * BHL + 0.333 * BJK + 0.333 * BJL - 0.333 * BKL + 0.333 * CDE+ 0.333 * CDF + 0.333 * CDG + 0.333 * CDH + 0.333 * CDJ - 0.333 * CDK+ 0.333 * CDL - 0.333 * CEF - 0.333 * CEG + 0.333 * CEH + 0.333 * CEJ- 0.333 * CEK + 0.333 * CEL + 0.333 * CFG + 0.333 * CFH - 0.333 * CFJ+ 0.333 * CFK + 0.333 * CFL + 0.333 * CGH + 0.333 * CGJ + 0.333 * CGK- 0.333 * CGL - 0.333 * CHJ - 0.333 * CHK - 0.333 * CHL + 0.333 * CJK+ 0.333 * CJL + 0.333 * CKL + 0.333 * DEF - 0.333 * DEG - 0.333 * DEH+ 0.333 * DEJ + 0.333 * DEK + 0.333 * DEL + 0.333 * DFG + 0.333 * DFH- 0.333 * DFJ + 0.333 * DFK - 0.333 * DFL + 0.333 * DGH - 0.333 * DGJ

© 2010 Daniel Joelsson All rights reserved

Page 18: DoE for assay_developers_chp1_rev-1.0

- 0.333 * DGK + 0.333 * DGL + 0.333 * DHJ + 0.333 * DHK - 0.333 * DHL+ 0.333 * DJK + 0.333 * DJL - 0.333 * DKL + 0.333 * EFG - 0.333 * EFH+ 0.333 * EFJ + 0.333 * EFK - 0.333 * EFL + 0.333 * EGH + 0.333 * EGJ+ 0.333 * EGK + 0.333 * EGL - 0.333 * EHJ + 0.333 * EHK + 0.333 * EHL- 0.333 * EJK + 0.333 * EJL + 0.333 * EKL + 0.333 * FGH + 0.333 * FGJ- 0.333 * FGK - 0.333 * FGL + 0.333 * FHJ - 0.333 * FHK + 0.333 * FHL- 0.333 * FJK + 0.333 * FJL + 0.333 * FKL - 0.333 * GHJ + 0.333 * GHK+ 0.333 * GHL + 0.333 * GJK - 0.333 * GJL + 0.333 * GKL + 0.333 * HJK+ 0.333 * HJL + 0.333 * HKL - 0.333 * JKL

As you can see, you really don't want to try to figure out if any of these interactions areconfounded with the main effect. But, if you look closely, you can also see that the maineffect of A is only aliased with interactions of other factors. Thus, you can use these designsto estimate all of the main effects without a problem. That being said, if you think you willhave any interactions at all, you're better off running a few more runs in a fractional factorial.

During assay development, it's rare to not have any interactions. In my experience, thesetypes of designs are most often used in robustness experiments where you expect veryfew of the factors to be significant. In those cases, a follow up experiment can be used tofurther investigate the presence of interactions after the Plackett-Burman design was used toeliminate most of the factors from consideration.

1.6 How to pick a design - blocking, resolution, and power

Let's say you have an experimental system in mind. You have identified the factors you wantto investigate. How do you get started?

First, we have to decide on a low and high level setting for each of your factors. This is wheresome of the "art" of DOE comes into play and why you always need a subject matter expertinvolved in the design phase. The best guidance I can give you is to set the levels of yourfactors aggressively, but not too aggressively. Helpful, huh?

Let's break down that statement. What exactly does it mean to set your levels aggressively?Imagine that you have a response that increases as you move from low to high of one of thefactors (figure 1.11).

© 2010 Daniel Joelsson All rights reserved

Page 19: DoE for assay_developers_chp1_rev-1.0

Figure 1.11. Picking the correct levels for a factor.

In a screening design, you will usually run just two levels of a factor (and sometimes a centerpoint). If you picked the two levels that are shown in red in Figure 1.10 you may not beable to detect a change in the response. Instead, it makes more sense to pick the two greenlevels. You would probably be able to detect the difference between them. Remember thatthe point is not to optimize the settings of the factor, just to identify the factors that areactually impacting your system. Also notice how the response continues outside of the greenlevels. You don't want to pick levels that are on an edge of the area you have explored inthe past. Select levels aggressively, but not too aggressively.

Another problem that often occurs in DOE designs for assay development is that you have tospread the testing across several operators, days, lots of reagents, etc. and you're worriedthat these changes will affect your responses. DOE designs can take care of those problemsfor us using a concept called blocking. Blocking essentially adds another factor for each ofthese "nuisance variables" to your model. The difference between these factors and your"regular" ones is that these factors are not analyzed for significance. Instead, any effects dueto them are subtracted from the other responses so that they don't mask the effects of theresponses you are really interested in. Each blocking variable uses up one available factorthat you can estimate, so you need to make sure you have enough power in your design. Ifyour design has a complicated aliasing structure, blocking makes it even worse since you'venow added yet another factor. Be judicious in your design to keep your number of blockslow.

Resolution is another concept that is important to understand when picking a screeningdesign. Common factorials can be categorized into one of four types: resolution III,resolution IV, resolution V and higher, and saturated designs. These categories tell you

© 2010 Daniel Joelsson All rights reserved

Page 20: DoE for assay_developers_chp1_rev-1.0

what the aliasing structures of these design are. In a resolution III design, the main effectsare aliased with two-factor interactions. In a resolution IV design, two-factor interactionsare aliased with other two-factor interactions and main effects are aliased with three-factorinteractions. In a resolution V design, two factor interactions are aliased with three factorinteractions, and main effects are aliased with four factor interactions. A saturated designhas no aliasing at all (i.e. all interactions can be estimated). 1

How then do you use this information? If you're interested in estimating the main effects andyou're worried about aliasing them with two-factor interactions, you would pick a resolutionIV design. However, if you expect that two-factor interactions are rare, you may be ableto get away with a resolution III design. Most DOE software has a handy table to help youselect designs. Figure 1.12 is an example of such a table.

Number of Factors

Figure 1.12 - Table with possible fractional factorial designs

1. There's an easy way to remember these alising relationships called the finger rule. If youhold up the number of fingers as the name of the resolution (i.e. three fingers for a resolutionIII design), you will be able to split them into groups that show the aliasing structure. Threefingers can be divided into a pair and a single finger. Therefore, the main effects (the singlefinger) is aliased with two factor interactions (the pair). This also works with resolution IVand V designs. Try it and see for yourself.

© 2010 Daniel Joelsson All rights reserved

Page 21: DoE for assay_developers_chp1_rev-1.0

If you study the table, you'll see that the lower the resolution, the less runs you need toperform to estimate the effects you care about. Intuitively, this makes sense. The moreinformation you have (runs), the less confounding (aliasing) you have in the results.

The table is also useful in determining how fractionated each design is. For example, a 2^5full factorial has 32 runs. The first half fraction of this factorial has 16 runs, resolution V, andis denoted 2^(5-1). A quarter fraction is denoted 2^(5-2), has 8 runs and is a resolution IIIdesign. By studying the table you can quickly familiarize yourself with how this nomenclatureworks.

You might also notice that some designs have the same resolution and factors, yet oneonly has half as many runs as the other (2^(7-2) and 2^(7-3) is such an example). Howis that possible, and why would you ever run the design with more runs? It comes downto statistical power. Power is what gives you the confidence that you actually detected aresponse. The way the analysis works, the more power you have, the more confident youare that you're not missing the significance of the effect. In a low power design, the effectof a factor has to be much higher than the variability in the measurement in order to bedetected.

To complicate matters, another way to increase power is to add replicates of a design.Replication decreases the uncertainly, and thus variability, of your responses. It's not alwaysclear whether it's better to run a less fractionated design or just replicate a more fractionatedone. If you find yourself in this conundrum, your best bet is to talk to a statistician. Inmost cases, I would personally choose the less fractionated design without replication, sinceit includes runs with more combinations of all the different factors. I therefore feel like I havemore confidence in the analysis of interactions.

To summarize, when picking a design, you have to consider how many factors you have, ifyou need to add blocks to the design, what resolution you need in your aliasing structure, andhow many runs you are willing to perform. Once you have made those decisions, you can usea DOE software package to decide if you will have enough power to detect the differences youexpect in your factors. If you don't have enough, you can either switch to a higher resolutiondesign or add more replicates to your current design.

1.7 Response Variables - what to measure?

Once you have your design, there's one last decision to make before you actually executeyour runs: what to measure. Usually, you have at least one measure in mind when you startthinking about your experiments. In assay development, some common examples includesignal strength, dynamic range, curve parameters (such as slope), replicate variability,background, signal to noise, etc.

© 2010 Daniel Joelsson All rights reserved

Page 22: DoE for assay_developers_chp1_rev-1.0

It may seem like it would be a lot of work to optimize all of those responses, but DOE makesit extremely easy. As you will see in the next chapter, once you do your runs, analysis isessentially free. So I would encourage you to think up front about measuring as many thingsas possible even if you don't intend to analyze them right away. I've seen many exampleswhere a response you didn't think would matter suddenly becomes important. If you havethe data, the analysis is usually much less painful than having to go back and generate moredata.

Another reason for measuring as many parameters as possible is that some responsesmay contradict others. For example, optimizing only for signal strength may increasebackground. The optimum setting for optimizing both may be a compromise setting. Youwould not know that unless you analyze for both variables at the same time. As you will seein the next chapter, most DOE software packages have optimization algorithms that will helpyou find the optimum settings across all your responses.

© 2010 Daniel Joelsson All rights reserved