Lecture 9 Page 1 CS 239, Spring 2007 More Experiment Design CS 239 Experimental Methodologies for System Software Peter Reiher May 8, 2007

Lecture 9Page 1CS 239, Spring 2007

More Experiment DesignCS 239

Experimental Methodologies for System Software

Peter ReiherMay 8, 2007


Outline

• Multiplicative experiment design models

• 2kr factorial experiment designs

• 2k-p fractional factorial designs

• Confounding in fractional factorial designs


• Assumptions of additive models

• Example of a multiplicative situation

• Handling a multiplicative model

• When to choose multiplicative model

• Multiplicative example

Multiplicative Models for 22r Experiments


Assumptions of Additive Models

• Last time’s analysis used additive model:

– yij = q0+ qAxA+ qBxB+ qABxAxB+ eij

• Assumes all effects are additive:

– Factors

– Interactions

– Errors

• This assumption must be validated!


Example of aMultiplicative Situation

• Testing processors with different workloads• Most common multiplicative case• Consider 2 processors, 2 workloads

– Use 22r design• Response is time to execute wj instructions

on processor that takes vi seconds/instruction• wj and vi sound like good factors to test• Without interactions, time is yij = viwj


Handlinga Multiplicative Model

• Take logarithm of both sides:yij = viwj

so log(yij) = log(vi) + log(wj)• Use additive model on logarithms• XA is log(vi), XB is log(wj)

– Choose your high and low levels for each• Resulting model is:

– log(yij) = q0 +qA XA+ qB XB+ qAB XA XB +eij

• But we care about yij, not log(yij)


Converting Back to yij

• Take antilog of both sides of equation

• UA = 10 qA

• UB = 10 qB

• UAB = 10 qAB

y q q x q x q x x eA A B B AB A B10 10 10 10 100


Meaning of aMultiplicative Model

• Model is

• Here, A = 10qA is ratio of MIPS ratings

of processors, B = 10qB is ratio of workload size

• Antilog of q0 is geometric mean of responses:where n = 22r

y q q x q x q x x eA A B B AB A B10 10 10 10 100

y y y yqn

n 10 0

1 2


When to Choosea Multiplicative Model?

• Physical considerations (see previous slides)• Range of y is large

– Making arithmetic mean unreasonable– Calling for log transformation

• Plot of residuals shows large values and increasing spread

• Quantile-quantile plot doesn’t look like normal distribution


Multiplicative Example• Consider additive model of processors A1

and A2 running benchmarks B1 and B2:

• Note large range of y values

y1 y2 y3 Mean I A B AB85.1 79.5 147.9 104.167 1 -1 -1 1

0.891 1.047 1.072 1.003 1 1 -1 -10.955 0.933 1.122 1.003 1 -1 1 -1

0.0148 0.0126 0.012 0.013 1 1 1 1Total 106.19 -104.15 -104.15 102.17Total/4 26.55 -26.04 -26.04 25.54


Error Scatterof Additive Model

-50

-25

0

25

50

0 20 40 60 80 100 120Predicted Response

Residuals


Quantile-Quantile Plotof Additive Model

-50

-25

0

25

50

-2 -1 0 1 2

Normal Quantile

Residual Quantile


Multiplicative Model

• Taking logs of everything, the model is:y1 y2 y3 Mean I A B AB

1.9299 1.9004 2.17 2.000 1 -1 -1 1-0.05 0.0199 0.0302 0.000 1 1 -1 -1-0.02 -0.03 0.05 0.000 1 -1 1 -1-1.83 -1.9 -1.9281 -1.886 1 1 1 1

Total 0.11 -3.89 -3.89 0.11Total/4 0.03 -0.97 -0.97 0.03


Error Residuals ofMultiplicative Model

-0.15

-0.10

-0.05

0.00

0.05

0.10

0.15

0.20

-3 -2 -1 0 1 2 3Predicted Response

Residuals


Quantile-Quantile Plot forMultiplicative Model

-0.15

-0.10

-0.05

0.00

0.05

0.10

0.15

0.20

-2 -1 0 1 2

Normal Quantile

Residual Quantile


Summary ofthe Two Models

Additive Model Multiplicative ModelPercentage Confidence Percentage Confidence

Factor Effect of Variation Interval Effect of Variation IntervalI 26.55 16.35 36.74 0.03 -0.02 0.07A -26.04 30.15 -36.23 -15.85 -0.97 49.85 -1.02 -0.93B -26.04 30.15 -36.23 -15.85 -0.97 49.86 -1.02 -0.93

AB 25.54 29.01 15.35 35.74 0.03 0.04 -0.02 0.07e 10.69 0.25

• Which suggests the time to run a benchmark depends only on the processor speed and benchmark size

• Sounds about right


General 2kr Factorial Design

• Simple extension of 22r

• Just k factors, not 2

• See Box 18.1 for summary

• Always do visual tests

• Remember to consider multiplicative model as alternative


Example of 2krFactorial Design

• Consider a 233 design

• 3 factors

• 2 levels for each

• 3 replications of each combination

• There will be more factor interaction terms, of course


Sign Table for a Sample 2krFactorial Design

y1 y2 y3 Mean I A B C AB AC BC ABC14 16 12 14 1 -1 -1 -1 1 1 1 -122 18 20 20 1 1 -1 -1 -1 -1 1 111 15 19 15 1 -1 1 -1 -1 1 -1 134 30 35 33 1 1 1 -1 1 -1 -1 -146 42 44 44 1 -1 -1 1 1 -1 -1 158 62 60 60 1 1 -1 1 -1 1 -1 -150 55 54 53 1 -1 1 1 -1 -1 1 -186 80 74 80 1 1 1 1 1 1 1 1

Total 319 67 43 155 23 19 15 -1Total/8 39.88 8.38 5.38 19.38 2.88 2.38 1.88 -0.13


Allocation of Variation for 233 Design

• Percent variation explained:

• 90% confidence intervals

A B C AB AC BC ABC Errors

14.1 5.8 75.3 1.7 1.1 0.7 0 1.37

I A B C AB AC BC ABC38.7 7.2 4.2 18.2 1.7 1.2 0.7 -1.341.0 9.5 6.5 20.5 4.0 3.5 3.0 1.0


Error Residualsfor 233 Design

-8

-6

-4

-2

0

2

4

6

8

0 20 40 60 80 100


Quantile-Quantile Plot for All Points for 233 Design

-8

-6

-4

-2

0

2

4

6

8

-3 -2 -1 0 1 2 3


Quantile-Quantile Plot for Means 233

-4

-2

0

2

4

6

8

-2 -1 0 1 2

• R2 for this one is .94


Concerns With These Kinds of Designs

• They don’t test all possible levels

– Only test two, in fact

– Solved by full factorial designs

• Which we’ll cover later

• They are a lot of work

– Especially if there are many factors

– Solved by fractional factorial design


Fractional Designs

• What if there are many factors?• You can’t afford to test all

combinations• Well, then, test only some of them• How should you determine which

combinations to test?– Losing least information


2k-p FractionalFactorial Designs

• Introductory example of a 2k-p design

• Preparing the sign table for a 2k-p design

• Confounding

• Algebra of confounding

• Design resolution


What Is A 2k-p FractionalFactorial Design?

• As before, test only two levels of each factor

• But instead of testing all 2k factors,• Only test 2k-p of them

– The larger p is, the fewer combinations tested

– E.g., for k = 5 and p = 2, reduces tests from 32 to 8


Introductory Exampleof a 2k-p Design

• Exploring 7 factors in only 8 experiments– k = 7, p = 4

• Full factorial design would take 128 experiments

• Won’t we save time!• Would be nice to know what price we paid

– We can’t know everything– But we can get some control


Sign Table for Example

Run A B C D E F G

1 -1 -1 -1 1 1 1 -12 1 -1 -1 -1 -1 1 13 -1 1 -1 -1 1 -1 14 1 1 -1 1 -1 -1 -15 -1 -1 1 1 -1 -1 16 1 -1 1 -1 1 -1 -17 -1 1 1 -1 -1 1 -18 1 1 1 1 1 1 1


Analysis of 27-4 Design• Column sums are zero:

• Sum of 2-column product is zero:

• Sum of column squares is 27-4 = 8

• Orthogonality allows easy calculation of effects:

x jiji 0

x x j lij ili 0

qy y y y y y y y

A 1 2 3 4 5 6 7 8

8


Effects and Confidence Intervals for 2k-p Designs

• Effects are as in 2k designs:

• % variation proportional to squared effects

• For standard deviations and confidence intervals:

– Use formulas from full factorial designs

– Replace 2k with 2k-p

q y xk p i ii

1

2


Preparing the Sign Table for a 2k-p Design

• Start by preparing a sign table for k-p factors

– Assign first k-p factors as before

• Then assign remaining factors

– In the place of some (or all) of the combined effects columns


Sign Table for k-p Factors• Same as table for experiment with k-p

factors

– I.e., 2(k-p) table

– 2k-p rows and 2k-p columns

– First column is I, contains all 1’s

– Next k-p columns get k-p selected factors

– Rest are products of factors


Assigning Remaining Factors

• 2k-p-(k-p)-1 product columns remain

• Choose any p columns

– Assign remaining p factors to them

– Any others stay as-is, measuring interactions


An Example

• Let’s build a 25-2 table

• So there are five factors A, B, C, D, and E

• But we only want to run 8 experiments

– p = 2

– 5-2=3


Start With a 23 Table

Exp # I A B C AB AC BC ABC

1 1 -1 -1 -1 1 1 1 -1

2 1 1 -1 -1 -1 -1 1 1

3 1 -1 1 -1 -1 1 -1 1

4 1 1 1 -1 1 -1 -1 -1

5 1 -1 -1 1 1 -1 -1 1

6 1 1 -1 1 -1 1 -1 -1

7 1 -1 1 1 -1 -1 1 -1

8 1 1 1 1 1 1 1 1


Now Add the Remaining Factors D and E

Exp # I A B C AB AC BC ABC

1 1 -1 -1 -1 1 1 1 -1

2 1 1 -1 -1 -1 -1 1 1

3 1 -1 1 -1 -1 1 -1 1

4 1 1 1 -1 1 -1 -1 -1

5 1 -1 -1 1 1 -1 -1 1

6 1 1 -1 1 -1 1 -1 -1

7 1 -1 1 1 -1 -1 1 -1

8 1 1 1 1 1 1 1 1

Exp # I A B C AB AC BC D

1 1 -1 -1 -1 1 1 1 -1

2 1 1 -1 -1 -1 -1 1 1

3 1 -1 1 -1 -1 1 -1 1

4 1 1 1 -1 1 -1 -1 -1

5 1 -1 -1 1 1 -1 -1 1

6 1 1 -1 1 -1 1 -1 -1

7 1 -1 1 1 -1 -1 1 -1

8 1 1 1 1 1 1 1 1

Exp # I A B C AB AC E D

1 1 -1 -1 -1 1 1 1 -1

2 1 1 -1 -1 -1 -1 1 1

3 1 -1 1 -1 -1 1 -1 1

4 1 1 1 -1 1 -1 -1 -1

5 1 -1 -1 1 1 -1 -1 1

6 1 1 -1 1 -1 1 -1 -1

7 1 -1 1 1 -1 -1 1 -1

8 1 1 1 1 1 1 1 1


Our Final Sign Table


1 1 -1 -1 -1 1 1 1 -1

2 1 1 -1 -1 -1 -1 1 1

3 1 -1 1 -1 -1 1 -1 1

4 1 1 1 -1 1 -1 -1 -1

5 1 -1 -1 1 1 -1 -1 1

6 1 1 -1 1 -1 1 -1 -1

7 1 -1 1 1 -1 -1 1 -1

8 1 1 1 1 1 1 1 1


Running Experiments With This Sign Table

• Use it just as before

• Run the set of experiments the table indicates

– E.g., run A,B,C,D at low level, E at high level

– Then A and E at high, B, C, and D at low

– And so on


Calculating Effects With the Sign Table

• Just like before

• Multiply experiment results by columns

• Add up results

• Divide by number of experiments

• There are your q values


What Have We Paid?

• What about all the other effect combinations?

• The fourth column shows the combined effects of A and B

• The fifth column shows the combined effects of A and C


1 1 -1 -1 -1 1 1 1 -1

2 1 1 -1 -1 -1 -1 1 1

3 1 -1 1 -1 -1 1 -1 1

4 1 1 1 -1 1 -1 -1 -1

5 1 -1 -1 1 1 -1 -1 1

6 1 1 -1 1 -1 1 -1 -1

7 1 -1 1 1 -1 -1 1 -1

8 1 1 1 1 1 1 1 1


• The other combined effects were confounded

• The confounding problem

• An example of confounding

• Confounding notation

• Choices in fractional factorial design

Confounding


The Confounding Problem

• Fundamental to fractional factorial designs

• Some effects produce combined influences

– Limited experiments mean only some combinations can be calculated

• Problem of combined influence is confounding

– Inseparable effects called confounded effects


An Example of Confounding

• Consider this 23-1 table:

• Extend it with an AB column:

I A B C1 -1 -1 11 1 -1 -11 -1 1 -11 1 1 1

I A B C AB1 -1 -1 1 11 1 -1 -1 -11 -1 1 -1 -11 1 1 1 1


Analyzing theConfounding Example

• Effect of C is same as that of AB:

qC = (y1-y2-y3+y4)/4

qAB = (y1-y2-y3+y4)/4

• Formula for qC really gives combined effect:

qC+qAB = (y1-y2-y3+y4)/4

• No way to separate qC from qAB

– Not a problem if qAB is known to be small


Let’s Go Back to Our Example

• Where are combined effects AD, AE, BC, BD, BE, CD, CE, and DE?• Not to mention ABC, ABD, ABE, ACD, ACE, ADE, BCD, BDE, BCE, CDE, ABCD, ABCE, ACDE, ABDE,BCDE, and ABCDE?


1 1 -1 -1 -1 1 1 1 -1

2 1 1 -1 -1 -1 -1 1 1

3 1 -1 1 -1 -1 1 -1 1

4 1 1 1 -1 1 -1 -1 -1

5 1 -1 -1 1 1 -1 -1 1

6 1 1 -1 1 -1 1 -1 -1

7 1 -1 1 1 -1 -1 1 -1

8 1 1 1 1 1 1 1 1


Confounding Notation• Previous 23-1 confounding is denoted by

equating confounded effects:C = AB

• Other effects are also confounded in this design:A = BC, B = AC, I = ABC

– Last entry indicates ABC is confounded with overall mean, or q0


What Does Confounding Really Mean?

• Each effect is a combination of several effects from a full experiment

• Impossible to pull out one from the other

– Unless you change design and make more runs

• Must be aware of what’s getting confounded


Getting Concrete on The Meaning of Confounding

• Consider our generic 23-1 fractional factorial experiment

• What if we’re measuring computer performance?

• With three factors: 1. CPU speed (A)2. Memory size (B)3. Disk speed (C)


Using Our Fractional Design,

• We have combined the effect of disk speed with the interaction of CPU speed and memory size (C=AB)

• And the effect of CPU speed with the combined effects of disk speed and memory size (A=BC)

• Among several others


Does This Matter?

• Maybe• What if disk speed had little effect on

system performance?• But the interaction between memory size

and CPU speed had a lot• Might conclude that you can speed up the

system by increasing disk speed• When actually that makes no difference


Choosing a Fractional Factorial Design

• Many fractional factorial designs possible– Chosen when assigning remaining p signs– 2p different designs exist for 2k-p

experiments• Some designs better than others

– Desirable to confound significant effects with insignificant ones

– Usually means low-order with high-order


Algebra of Confounding

• Rules of the algebra

• Generator polynomials


Rules ofConfounding Algebra

• Particular design can be characterized by single confounding

– Traditionally, the I = wxyz... confounding

• Others can be found by multiplying by various terms

– I acts as unity (e.g., I times A is A)

– Squared terms disappear (AB2C becomes AC)


Example:23-1 Confoundings

• Design is characterized by I = ABC

• Multiplying by A gives A = A2BC = BC

• Multiplying by B, C, AB, AC, BC, and ABC:B = AB2C = AC, C = ABC2 = AB,AB = A2B2C = C, AC = A2BC2 = B,BC = AB2C2 = A, ABC = A2B2C2 = I


Generator Polynomials

• Polynomial I = wxyz... is called generator polynomial for the confounding

• A 2k-p design confounds 2p effects together

– So generator polynomial has 2p terms

– Can be found by considering interactions replaced in sign table


Example of Findinga Generator Polynomial

• Consider a 27-4 design

• Sign table has 23 = 8 rows and columns

• First 3 columns represent A, B, and C

• Columns for D, E, F, and G replace AB, AC, BC, and ABC columns respectively

– So confoundings are necessarily:D = AB, E = AC, F = BC, and G = ABC


The 27-4 Design

Exp # I A B C D E F G

1 1 -1 -1 -1 1 1 1 -1

2 1 1 -1 -1 -1 -1 1 1

3 1 -1 1 -1 -1 1 -1 1

4 1 1 1 -1 1 -1 -1 -1

5 1 -1 -1 1 1 -1 -1 1

6 1 1 -1 1 -1 1 -1 -1

7 1 -1 1 1 -1 -1 1 -1

8 1 1 1 1 1 1 1 1

AB AC BC ABC


Turning Basic Terms into Generator Polynomial

• Basic confoundings are D = AB, E = AC, F = BC, and G = ABC

• Multiply each equation by left side:I = ABD, I = ACE, I = BCF, and I = ABCGorI = ABD = ACE = BCF = ABCG


Finishing theGenerator Polynomial

• Any subset of above terms also multiplies out to I– E.g., ABD times ACE = A2BCDE = BCDE

• Expanding all possible combinations gives the 16-term generator (book is wrong, ABDG isn’t one of them):I = ABD = ACE = BCF = ABCG = BCDE = ACDF = CDG = ABEF = BEG = AFG = DEF = ADEG = BDFG = CEFG= ABCDEFG


Fractional Design Resolution

• What is resolution?

• Finding resolution

• Choosing a resolution


What Is Resolution?

• Design is characterized by its resolution

• Resolution measured by order of confounded effects

• Order of effect is number of factors in it

– E.g., I is order 0, and ABCD is order 4

• Order of confounding is sum of effect orders

– E.g., AB = CDE would be of order 5


Definition of Resolution

• Resolution is minimum order of any confounding in design

• Denoted by uppercase Roman numerals

– E.g, 25-1 with resolution of 3 is called RIII

– Or more compactly,2

III

5-1


Finding Resolution

• Find minimum order of effects confounded with mean– I.e., search generator polynomial

• Consider earlier example: I = ABD = ACE = BCF = ABCG = BCDE = ACDF = CDG = ABEF = BEG = AFG = DEF = ADEG = BDFG = CEFG = ABCDEFG

• So it’s RIII design


Choosing a Resolution• Generally, higher resolution is better

• Because usually higher-order interactions are smaller

• Exception: when low-order interactions are known to be small

– Then choose design that confounds those with important interactions

– Even if resolution is lower


Fractional Designs With Replications

• You can (of course) run multiple experiments for each fractional setting

– Formally, a 2k-pr design

• Multiplicative factor in number of runs

– Rather than exponential

• Is this a better use of your experiment time?

• Depends


Depends on What?

• Are confounded interactions more important than statistical variation?

• Running 2 replications is same work as decreasing p by 1

• Maybe better to reconsider your factors– Do you need all of them?– Decreasing by 1 factor allows less

confounding for same experiments


So, More Runs or Less Confounding?

• 2x runs are often used for preliminary experiments, anyway

• To identify important factors to look at more deeply

• How likely is it that an “unlucky” single replication will misidentify important factor?

Documents

Lecture 9 Page 1 CS 239, Spring 2007 More Experiment Design CS 239 Experimental Methodologies for System Software Peter Reiher May 8, 2007