Stat 502 Design and Analysis of Experiments Factorial Design

Stat 502Design and Analysis of Experiments

Factorial Design

Fritz Scholz

Department of Statistics, University of Washington

December 2, 2013

1

Factorial Design

We have looked at 1-sample, 2-sample, and t-sample problems.

We dealt with a treatment at t levels or with t treatments.

A treatment with t levels could also be viewed as a factor.

Now address experiments where several factors come into play.

First we will do this for two such factors.

How should we go about this?

2

Insecticide Data

> poison=read.csv("poison.csv",header=T)

> poisony type delivery

1 3.1 I A2 4.5 I A3 4.6 I A4 4.3 I A5 3.6 II A6 2.9 II A7 4.0 II A8 2.3 II A9 2.2 III A.....44 3.8 II D45 3.0 III D46 3.6 III D47 3.1 III D48 3.3 III D

3

Insecticide Example

We have 3 types of insecticides (I, II, and III) and 4 methods(A,B,C,D) of delivering the insecticide.

(I, II, and III) and (A,B,C,D) are the levels of the respectivefactors type of insecticide and insecticide delivery method.

The response Y is the time to death in minutes.

Find the best insecticide and the best delivery method.

We have 48 experimental insects to experiment with.

Randomly divide the 48 insects into 12 = 3× 4 groups of 4insects each, assigning the respective groups to the 12treatment combinations(I,A), (I,B), (I,C), (I,D), . . . , (III,C), (III,D).

Randomize the order of all 48 runs to eliminate order biases.

This is a factorial design,speci�cally a 3× 4 factor design with 4 replications.

4

Response Table

Visualize responses in relation to the factor levels as follows:

Insecticide

Type

Delivery Method

A B C D

I YI,A YI,B YI,C YI,D

II YII,A YII,B YII,C YII,D

III YIII,A YIII,B YIII,C YIII,D

YI,A short for (YI,A,1, . . . ,YI,A,4) (replication depth), etc.

More generically we would denote the kth response under leveli from factor 1 and under level j from factor 2 by Yijk .

Triplet notation (i , j , k) is more useful than ` = 1, 2, . . . , 48.

Useful in Σ summation notation and also for identifying thefactor levels, i.e., the factor level/replication coordinates.

5

First Look at Insecticide Boxplots

●

I II III

24

68

1012

type of poison

time

to d

eath

(m

inut

es)

●

A B C D

24

68

1012

delivery method

time

to d

eath

(m

inut

es)

6

First Impressions and Questions

Insecticide type III and delivery method A seem to give thebest combination.

Is combination the right word here?

Are the e�ects of delivery consistent across types, i.e., is thedelivery e�ectiveness order (in terms of faster response time)the same from one insecticide type to another?

It could be that delivery type A is not the fastest actingamong all four when applied to insecticide type III.

Method C could actually be better in combination with III.

7

Insecticide Responses by Delivery Methodtim

e to

dea

th (

min

utes

)

delivery method A

I II III

05

1015

●

●●●

●

●

●

● ●●●●

time

to d

eath

(m

inut

es)

delivery method B

I II III

05

1015

●

●

●

●

●

●

●

●

●

●●

●

time

to d

eath

(m

inut

es)

delivery method C

I II III

05

1015

●●

●

●

●

●●

●

●●●●

time

to d

eath

(m

inut

es)

delivery method D

I II III

05

1015

●

●●●

●

●

●

●

●●●●

8

Full Comparison of Insecticide Responsestim

e to

dea

th (

min

utes

)

24

68

1012

●

●●

●

I.A

●

●

●

●

II.A

●●

●

●

III.A

●

●

●

●

I.B

●

●

●

●

II.B

●

●●

●

III.B

●●

●

●

I.C

●

●

●

●

II.C

●●●●

III.C

●

●

●

●

I.D

●

●

●

●

II.D

●

●

●●

III.D

type.delivery combinations

9

Comments

Insecticide type III seems to have lowest response with all 4delivery methods.

The mean levels for each (III, delivery method) combinationare ≈ consistent.

The scatter within each (III, delivery method) combination isquite tight.

Delivery appears to have an e�ect on the response under typeI and II, both in absolute terms and relative to each other.

It appears: scatter ↗ as mean ↗ across all combinations.

=⇒ variance stabilizing transformation. Deal with that �rst.

10

Linear Fit log(si) = a × log(µi) + b

●●

●

●

●

●

●

●

●

●

●

●

2 3 4 5 6 7 8 9

0.0

0.5

1.0

1.5

2.0

2.5

3.0

µi

s i

●●

●

●

●

●

●

●

●

●

●

●

0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2

−2.

0−

1.5

−1.

0−

0.5

0.0

0.5

1.0

log(µi)

log(

s i)

intercept = −3.2

slope = 1.977

11

Reciprocal Transform

According to guideline: ⇒ α = 2 or λ = 1− α = −1, i.e.,Yijk = Y−1ijk = 1/Yijk a reciprocal transform for our response.

A rationalization attempt: Suppose the absorption rateR = d/t (of dose d over time t) under any given combinationis the most variable process aspect from insect to insect.

Assume that this absorption variability (ingestion variabilityfrom insect to insect) is constant across all (insecticide type,delivery method) combinations.

Assume lethal dose D is ≈ constant for each type, ∀ insects.Then the time to reach lethal dose is T = D/R .

The transformed 1/T = R/D has ≈ constant variability.

A 1-term Taylor expansion around µR with D ≈ constant

T =D

R≈

D

µR− (R − µR)

D

µ2R⇒ µT ≈

D

µR, var(T ) ≈

D2σ2Rµ4R

⇒ σT ∝ µ2T

12

Reciprocal Time Boxplots

I II III

0.1

0.2

0.3

0.4

0.5

type of poison

reci

proc

al ti

me

to d

eath

(1/

min

ute)

A B C D0.

10.

20.

30.

40.

5

delivery method

reci

proc

al ti

me

to d

eath

(1/

min

ute)

The variability is much more stable now.

13

Reciprocal Time by Delivery Methodre

cipr

ocal

tim

e to

dea

th (

1/m

inut

es)

delivery method A

I II III

0.0

0.1

0.2

0.3

0.4

0.5

0.6

●

●●●

●

●

●

●●●

●

●

reci

proc

al ti

me

to d

eath

(1/

min

utes

)

delivery method B

I II III

0.0

0.1

0.2

0.3

0.4

0.5

0.6

●

●●●

●

●

●

●

●

●●

●

reci

proc

al ti

me

to d

eath

(1/

min

utes

)

delivery method C

I II III

0.0

0.1

0.2

0.3

0.4

0.5

0.6

●●

●

●

●

●

●

●

●

●●

●

reci

proc

al ti

me

to d

eath

(1/

min

utes

)

delivery method D

I II III

0.0

0.1

0.2

0.3

0.4

0.5

0.6

●

●●●

●

●

●

●

●

●

●●

14

Full Comparison of Reciprocal Timesre

cipr

ocal

tim

e to

dea

th (

1/m

inut

e)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

●

●●

●

I.A

●

●

●

●

II.A

●

●

●

●

III.A

●

●

●

●

I.B

●

●

●

●

II.B

●

●●

●

III.B

●●

●

●

I.C

●

●

●

●

II.C

●

●

●

●

III.C

●

●●●

I.D

●

●

●

●

II.D

●

●

●

●

III.D

type.delivery combinations

Note the consistent variability across all 12 treatment combinations!

15

log(si) = a × log(µi) + b for Reciprocal Times

●

●

●

●

●

●

●

●

●

●

●

●

0.2 0.3 0.4

0.02

0.03

0.04

0.05

0.06

0.07

0.08

µi

s i

●

●

●

●

●

●

●

●

●

●

●

●

−2.2 −2.0 −1.8 −1.6 −1.4 −1.2 −1.0 −0.8−

3.8

−3.

6−

3.4

−3.

2−

3.0

−2.

8−

2.6

log(µi)

log(

s i)

No strong linearity remains!

16

ANOVA of Reciprocal Times vs Type

Assume recip.time,type,delivery are in the workspace.type and delivery are factorsor write as.factor(type) in place of type!> anova(lm(recip.time ∼ type))or> anova(lm(recip.time ∼ type,data=poison))

Analysis of Variance Table

Response: recip.timeDf Sum Sq Mean Sq F value Pr(>F)

type 2 0.34877 0.17439 25.621 3.728e-08 ***Residuals 45 0.30628 0.00681---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

17

ANOVA of Reciprocal Times vs Delivery

> anova(lm(recip.time ∼ delivery))or> anova(lm(recip.time ∼ delivery,data=poison))

Analysis of Variance TableResponse: recip.time

Df Sum Sq Mean Sq F value Pr(>F)delivery 3 0.20414 0.06805 6.6401 0.0008496 ***Residuals 44 0.45091 0.01025

18

ANOVA for Reciprocal Timesvs All Type:Delivery Combinations

> anova(lm(recip.time ∼ type:delivery))or> anova(lm(recip.time ∼

type:delivery,data=poison))

Analysis of Variance TableResponse: recip.time

Df Sum Sq Mean Sq F value Pr(>F)type:delivery 11 0.56862 0.05169 21.531 1.289e-12 ***Residuals 36 0.08643 0.00240

This is like a one-way ANOVA with 12 treatments.

Legitimate but not enlightening.

19

All Three ANOVAs for Reciprocal Times

$ANOVA.type Analysis of Variance TableResponse: recip.time

Df Sum Sq Mean Sq F value Pr(>F)type 2 0.34877 0.17439 25.621 3.728e-08 ***Residuals 45 0.30628 0.00681

$ANOVA.delivery Analysis of Variance TableResponse: recip.time

Df Sum Sq Mean Sq F value Pr(>F)delivery 3 0.20414 0.06805 6.6401 0.0008496 ***Residuals 44 0.45091 0.01025

$ANOVA.type.delivery Analysis of Variance TableResponse: recip.time


20

Are these Analyses Appropriate?

What does MSE represent in the �rst two ANOVAs?

Compare these values to MSE in the third ANOVA.

The MSE 's in the �rst two ANOVAs are in�ated because meanvariation in the ignored factor is part of the error variation.

On slide 14 note how the variation within each of the 4delivery groups also re�ects the response variation due topoison type (color).

The variation within each of the three colors (poison type) alsore�ects the variation due to delivery method.

Note that the p-value in the third ANOVA is very muchsmaller than in the other two ANOVAs.

This is due to appropriately smaller MSE here.

21

The Third ANOVA

The third ANOVA is technically correct in stating that themeans change across factor level combinations.

View the 3 · 4 combinations as t = 12 treatments/samples.

Does the third ANOVA give any insight on the separatecontributions of the type factor and the delivery factor?

No! It is insu�cient in that regard.

It is easy to conceive of situations where the 1st and 2nd

ANOVA produce insigni�cant F -values but the 3rd ANOVAproduces a highly signi�cant F -value.

This can result when the MSE in the �rst two ANOVAs areunduly in�ated, i.e., � MSE in the third ANOVA.

The next 5 slides illustrate this with doctored data.

22

View by Typere

cipr

ocal

tim

e to

dea

th (

1/m

inut

e)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

●

●●

●

●

●

●

●

●●

●

●

●

●●●

I

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

II

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

III

insecticide type

The delivery variation is a good part of the �error� variation within type.

23

View by Deliveryre

cipr

ocal

tim

e to

dea

th (

1/m

inut

e)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

●

●●

●

●

●

●

●

●

●

●

●

A

●

●

●

●

●

●

●

●

●

●●

●

B

●●

●

●

●

●

●

●

●

●

●

●

C

●

●●●

●

●

●

●

●

●

●

●

D

delivery method

The type variation is a good part of the �error� variation within delivery.

24

View by Typere

cipr

ocal

tim

e to

dea

th (

1/m

inut

e)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

●

●●

●

●

●

●

●

●●

●

●

●

●●●

I

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

II

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

III

insecticide type

(III,A) lowered by .4, (I,D) raised by .15

3 type data sets seem well meshed when ignoring the 4 delivery methods.

25

View by Deliveryre

cipr

ocal

tim

e to

dea

th (

1/m

inut

e)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

●

●●

●

●

●

●

●

●

●

●

●

A

●

●

●

●

●

●

●

●

●

●●

●

B

●●

●

●

●

●

●

●

●

●

●

●

C

●

●●●

●

●

●

●

●

●

●

●

D

delivery method

(III,A) lowered by .4, (I,D) raised by .15

4 delivery methods seem well meshed when ignoring the 3 poison types.

26

Three ANOVAs for Modi�ed Data

$ANOVA.type Analysis of Variance TableResponse: recip.time

Df Sum Sq Mean Sq F value Pr(>F)type 2 0.03592 0.01796 1.5582 0.2217Residuals 45 0.51868 0.01153

$ANOVA.delivery Analysis of Variance TableResponse: recip.time

Df Sum Sq Mean Sq F value Pr(>F)delivery 3 0.08425 0.02808 2.6271 0.06208 .Residuals 44 0.47035 0.01069

$ANOVA.type.delivery Analysis of Variance TableResponse: recip.time


27

Three ANOVAs for Modi�ed Data

The �rst two ANOVAs con�rm that neither factor alone (typeor delivery) shows a signi�cant e�ect (p-value ≤ α = .05).

This con�rms the meshing comments made below the last twoplots.

By the data changes we aligned the means of all compareddata sets more closely, but the variation from the ignoredfactor still in�ates the MSE , leading to non-signi�cant results.

The third ANOVA shows a highly signi�cant e�ect oftype:delivery combination. This is e�ected by the muchreduced MSE here (0.00240 as compared to 0.01153 or0.01069).

28

Additive E�ects Model

For now we will deal only with the balanced model.

Same number n of replications per factor level combination.

Unbalanced cases can get quite messy.1

If time permits, may touch back on that later.

Additive E�ects Model:

Yijk = µ+ai+bj+εijk , i = 1, . . . , t1 , j = 1, . . . , t2 , k = 1, . . . , n

with independent error terms εijk ∼ N (0, σ2)

1For an extensive treatment of the unbalanced case see S.R. Searle (1987),Linear Models for Unbalanced Data, John Wiley & Sons, New York

29

Dealing with Identi�ability Issues

The parameters µ, a1, . . . , at1 , b1, . . . , bt2 , are unidenti�able.

Adding a constant c to µ and subtracting it from the ai (bj)give the same means µij = E (Yijk) for all (i , j).

µij = µ+ ai + bj = µ+ c + ai − c + bj = µ′ + a′i + bj

As in the 1-way ANOVA we only need t1 − 1 parameters todistinguish between t1 row levels, and similarly t2 − 1parameters to distinguish between t2 column levels.

There are two customary ways of imposing side conditions thatdeal with this, i.e., to render all parameters as identi�able.∑

i ai = 0 and∑

j bj = 0 sum-to-zero side conditions

a1 = 0 and b1 = 0 set-to-zero side conditions

30

Sum-to-zero Side Conditions

Assume µij = µ+ ai + bj with∑

i ai =∑

j bj = 0 or

a. = b. = 0.

Here we identify µ with the average mean over all levelcombinations, because µ.. = µ+ a. + b.= µ.

Since µ i. = µ+ ai + b. = µ+ ai we can interpretai = µ i. − µ = µ i. − µ.. as the average change from µ = µ..due to level i of factor one when averaged over all levels offactor two.

Similarly, bj = µ.j − µ = µ.j − µ...The parameters µ, ai , bj with a. = b. = 0 de�ne the means µijwith the above additive structure.

They are uniquely identi�ed via the µij , as shown above.

Changes in both levels are additive µij = µ+ ai + bj

31

Set-to-zero Side Conditions

Assume µij = µ? + a?i + b?j with a?1 = b?1 = 0.

Here we identify the parameter µ? with the mean under thefactor level combination (1, 1): µ11 = µ? + a?1 + b?1 = µ?.

We express each change from µ? due to other levels ( 6= 1) infactor one via a?i , i.e., µi1 = µ? + a?i + b?1 = µ? + a?iand thus a?i = µi1 − µ? = µi1 − µ11.Each change from µ? due to other levels ( 6= 1) in factor two isexpressed via b?j , i.e., µ1j = µ? + a?1 + b?j = µ? + b?jand thus b?j = µ1j − µ? = µ1j − µ11.The parameters µ?, a?i , b

?j with a?1 = b?1 = 0 de�ne the means

µij with the above additive structure.

They are uniquely identi�ed via the µij , as shown above.

Changes in both levels are additive µij = µ? + a?i + b?j

32

How to Create Such Mean Structures

Suppose you want to simulate such data.

How would you create such mean structures?

Pick any t1 − 1 + t2 − 1 + 1 = t1 + t2 − 1 numbers a2, . . . , at1 ,b2, . . . , bt2 , and µ.

In the sum-to-zero case take

a1 = −t1∑i=2

ai and b1 = −t2∑j=2

bj =⇒t1∑i=1

ai =

t2∑j=1

bj = 0

In the set-to-zero case take a1 = 0 and b1 = 0

In either case de�ne

µij = µ+ ai + bj

33

Additive Model as Reduced Model

The additive model is a reduced model since in the full modeleach factor level combination (i , j) has its own mean µij .

There are t1 · t2 such means µij which can vary freely.

In the additive model with identi�ability restrictions we onlyhave 1 + (t1 − 1) + (t2 − 1) = t1 + t2 − 1 free parameters.

Note that t1 · t2 − [t1 + t2 − 1] = (t1 − 1) · (t2 − 1) can besubstantially greater than zero.

We get zero only when one of the factors has just one level.

In that case we are back in the 1-way (1-factor) ANOVA.

The second factor only has one level, i.e., does not change.

34

A Tabular View of the Additive Model

t1 · t2 = 3 · 4 Factorial Design

Factor 2

b1 b2 b3 b4

a3 µ+ a3 + b1 µ+ a3 + b2 µ+ a3 + b3 µ+ a3 + b4

Factor 1 a2 µ+ a2 + b1 µ+ a2 + b2 µ+ a2 + b3 µ+ a2 + b4

a1 µ+ a1 + b1 µ+ a1 + b2 µ+ a1 + b3 µ+ a1 + b4

Rows i and i ′ di�er by ai − ai ′ = µij − µi ′j = a?i − a?i ′ ,the same amount across all columns j = 1 . . . , t2.

Columns j and j ′ di�er by bj − bj ′ = µij − µij ′ = b?j − b?j ′ ,the same amount across all rows i = 1, . . . , t1.

Such di�erences are meaningful regardless of additive modelparametrization, i.e., regardless of constraints (sum-to-zero orset-to-zero).

35

Additivity in Factor 1 ⇐⇒ Additivity in Factor 2

If we take 5 numbers, say 5, 7, 9, 2, 3 in a row, and create fournew rows by adding 2, or 4, or 5 we get the following tableau

a?1 = 0 5 7 9 2 3

a?2 = 2 7 9 11 4 5

a?3 = 4 9 11 13 6 7

a?4 = 5 10 12 14 7 8

b?j 0 2 4 −3 −2

The columns di�er automatically by constant amounts b?j .

=⇒ additivity in factor 2.

The column di�erences are set in the �rst row.

They are not a�ected by translating that �rst row to variouslevels via the a?i (additivity in factor 1).

36

Additive Model Decomposition

Yijk = Y... + (Yi.. − Y...) + (Y.j. − Y...)+(Yijk − Y... − [Yi.. − Y...]− [Y.j. − Y...])

= Y... + (Yi.. − Y...) + (Y.j. − Y...) + (Yijk − Yi.. − Y.j. + Y...)= µ+ ai + bj + εijk∑

i

ai = t11

t1

∑i

(Yi.. − Y...) = t1(Y... − Y...) = 0

∑j

bj = t21

t2

∑j

(Y.j. − Y...) = t2(Y... − Y...) = 0

∑ik

εijk = nt11

nt1

∑ik

(Yijk − Yi.. − Y.j. + Y...) = 0

∑jk

εijk = nt21

nt2

∑jk

(Yijk − Yi.. − Y.j. + Y...) = 0

∑ijk

εijk = 0

37

Least Squares Estimates

µ = Y..., ai = Yi.. − Y..., and bj = Y.j. − Y...are the least squares estimates of µ, ai , and bjsubject to the conditions

∑i ai = 0 and

∑j bj = 0.

This follows without calculus from

Q(µ, a,b) = Q(µ, a1, . . . , at1 , b1, . . . , bt2) =∑ijk

(Yijk − µ− ai − bj)2

=∑ijk

{Y... − µ+ [(Yi.. − Y...)− ai ]

+ [(Y.j. − Y...)− bj ] + (Yijk − Yi.. − Y.j. + Y...)}2

=∑ijk

{(Y... − µ)2 + [(Yi.. − Y...)− ai ]

2

+ [(Y.j. − Y...)− bj ]2 + (Yijk − Yi.. − Y.j. + Y...)2

}.

All cross product terms disappear (see next slide).

Note that∑

i ai =∑

j bj = 0, solutions satisfy constraints.

38

Cross Product Terms = 0

∑ijk

(Y... − µ)[(Yi.. − Y...)− ai ] = (Y... − µ)t2n

{∑i

(Yi.. − Y...)−∑i

ai

}= 0

∑ijk

(Y... − µ)[(Y.j. − Y...)− bj ] = (Y... − µ)t1n

∑j

(Y.j. − Y...)−∑j

bj

= 0

Yijk − Yi.. − Y.j. + Y... = εijk

∑ijk

(Y... − µ)εijk = (Y... − µ)∑ijk

εijk = 0

∑ijk

[(Yi.. − Y...)− ai ][(Y.j. − Y...)− bj ] = n∑i

[ai − ai ]∑j

[bj − bj ] = 0

∑ijk

[(Yi.. − Y...)− ai ]εijk =∑i

[(Yi.. − Y...)− ai ]∑jk

εijk = 0

∑ijk

[(Y.j. − Y...)− bj ]εijk =∑j

[(Y.j. − Y...)− bj ]∑ik

εijk = 0

39

Estimates: Sum-to-Zero −→ Set-to-Zero

As sum-to-zero estimates we have:µ = Y..., ai = Yi.. − Y..., and bj = Y.j. − Y...,

=⇒ µij = µ+ ai + bj .

Sum-to-zero estimates −→ set-to-zero estimates via

µ? = µ11 = µ+a1+b1, a?i = µi1−µ11 = ai−a1, b?j = µ1j−µ11 = bj−b1

Note that a?1 = b?1 = 0.

Furthermore, µij = µ? + a?i + b?j = µ+ ai + bj with

µ? = Y1.. + Y.1. − Y...a?i = Yi.. − Y1..b?j = Y.j. − Y.1.

40

Estimates: Set-to-Zero −→ Sum-to-Zero

The set-to-zero estimates µ?, a?i , b?j with a?1 = 0, b?1 = 0,

de�ne µij = µ? + a?i + b?j .

With these we get the sum-to-zero equivalent representation

µ = µ.. = µ? + a?. + b?.ai = µi. − µ.. = a?i − a?.bj = µ.j − µ.. = b?j − b?.

=⇒ µij = µ? + a?i + b?j = µ+ ai + bj

=⇒ µ = Y..., ai = Yi.. − Y..., and bj = Y.j. − Y...Set-to-zero is what lm in R gives as coe�cients.

41

Fitted Models

The �tted mean per treatment combination under eitherparametrization (sum-to-zero or set-to-zero) are the same, i.e.,

µij = µ+ ai + bj = µ? + a?i + b?j

Components of the �tted values have di�erent interpretations.

This is completely analogous to the previous parameter version

µij = µ+ ai + bj = µ? + a?i + b?j .

Explicitly, in terms of the data

µij = Y... + (Yi.. − Y...) + (Y.j. − Y...) = Yi.. + Y.j. − Y... .

Note

εijk = Yijk − Yi.. − Y.j. + Y... = Yijk − µij .

42

Orthogonal Decomposition of the Data Vector

Y =

Y111

.

.

.

Y11n

.

.

.

Y1t21

.

.

.

Y1t2n

.

.

.

.

.

.

Yt111

.

.

.

Yt11n

.

.

.

Yt1t21

.

.

.

Yt1t2n

=

µ...

µ...

µ...

µ...

.

.

.

µ...

µ...

µ...

µ

⊥+

a1...

a1...

a1...

a1...

.

.

.

at1...

at1...

at1...

at1

⊥+

b1...

b1...

bt2...

bt2...

.

.

.

b1...

b1...

bt2...

bt2

⊥+

ε111...

ε11n...

ε1t21...

ε1t2n...

.

.

.

εt111...

εt11n...

εt1t21...

εt1t2n

= µ1+a+b+ε

43

Orthogonalities

∑ijk

µ ai = µ t2n∑i

ai = 0

∑ijk

µ bj = µ t1n∑j

bj = 0

∑ijk

µ εijk = µ∑ijk

εijk = 0

∑ijk

ai bj = n∑i

ai∑j

bj = 0

∑ijk

ai εijk =∑i

ai∑jk

εijk

= 0

∑ijk

bj εijk =∑j

(bj∑ik

εijk

)= 0

44

Sum of Squares (SS) Decomposition

Orthogonality =⇒ following SS decomposition (Pythagoras again)∑ijk

Y 2

ijk =∑ijk

Y 2... +∑ijk

(Yi.. − Y...)2 +∑ijk

(Y.j. − Y...)2

+∑ijk

(Yijk − Yi.. − Y.j. + Y...)2

=∑ijk

µ2 +∑ijk

a2i +∑ijk

b2j +∑ijk

ε2ijk

=⇒∑ijk

(Yijk − Y...)2 =∑ijk

Y 2

ijk −∑ijk

Y 2... =∑ijk

Y 2

ijk −∑ijk

µ2

=∑ijk

a2i +∑ijk

b2j +∑ijk

ε2ijk

=∑ijk

(Yi.. − Y...)2 +∑ijk

(Y.j. − Y...)2

+∑ijk

(Yijk − Yi.. − Y.j. + Y...)2

SST = SSA + SSB + SSE45

Interpretation of SS Decomposition

SST = SSA + SSB + SSE

SST =∑

ijk(Yijk − Y...)2:Total variation of data around the grand or overall meanSSA =

∑ijk(Yi.. − Y...)2:

Variation of means around the grand mean(by factor 1 level, averaged over the levels of factor 2)SSB =

∑ijk(Y.j. − Y...)2:

Variation of means around the grand mean(by factor 2 level, averaged over the levels of factor 1)SSE =

∑ijk(Yijk − µij)2:

Variation of data around the �tted additive model value.

SSE =∑ijk

(Yijk −

µij︷︸︸︷[Y... + (Yi.. − Y...) + (Y.j. − Y...)])2

=∑ijk

(Yijk − Yi.. − Y.j. + Y...)2 =∑ijk

ε2ijk

46

Degrees of Freedom

In Y = µ1 + a + b + ε the component vectors areorthogonal to each other.

There is 1 degree of freedom in µ1 and thus there are areN − 1 = t1t2n − 1 degrees of freedom (df) in Y − µ1 ⊥ µ1and thus in SST .

Although the vector a contains t1 distinct values, only t1 − 1can vary freely, due to sum-to-zero or set-to-zero constraints.

There are t1 − 1 df in that vector and thus in SSA.

Similarly, there are t2 − 1 df in b and thus in SSB .

By orthogonal complement there are

t1t2n−1−(t1−1)−(t2−1) = (t1−1)(t2−1)+t1t2(n−1) = dfE

df in the residual error vector ε and thus in SSE .

47

ANOVA Table for the Additive Model

Source SS df MS F

A SSA t1 − 1 MSA = SSA/(t1 − 1) MSA/MSE

B SSB t2 − 1 MSB = SSB/(t2 − 1) MSB/MSE

Error SSE dfE MSE = SSE/dfE

Total SST t1t2n − 1

dfE = t1t2n−1− (t1−1)− (t2−1) = (t1−1)(t2−1) + t1t2(n−1)

View (t1 − 1)(t2 − 1) = t1t2 − [1 + (t1 − 1) + (t2 − 1)] as thenumber of means µij left unexplained by the additive model

View t1t2(n − 1) as the degrees of freedom of within cellvariation (n − 1 per cell) totaled over all t1t2 cells.

48

lm on Reciprocal Time to Death

out <- lm(recip.time ~ type + delivery)

out$coef

Coefficients:(Intercept) typeII typeIII deliveryB deliveryC deliveryD

0.26977 0.04686 0.19964 -0.16574 -0.05721 -0.13583

µ? a?2 a?3 b?2 b?3 b?4Note the implicit set-to-zero form of the parameter estimatesin out$coef! intercept = µ? with a?1 = b?1 = 0.

µ? represents the mean under the treatment combination(typeI,deliveryA)

a?i , b?j represent additive mean deviation e�ects from this

baseline µ?.

49

Sum-to-Zero Estimates

Below are the sum-to-zero estimates corresponding to the previousslide, using the conversion formulas from slide 41.

$mu.hat0.2622376

µ

$a.hattypeI typeII typeIII

-0.08216887 -0.03530475 0.11747362

a1 a2 a3

$b.hatdeliveryA deliveryB deliveryC deliveryD0.08969690 -0.07604334 0.03248336 -0.04613693

b1 b2 b3 b450

anova(lm(recip.time ∼ type + delivery))



type 2 0.34877 0.17439 71.708 2.865e-14 ***delivery 3 0.20414 0.06805 27.982 4.192e-10 ***Residuals 42 0.10214 0.00243---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

51

Distributional Facts

The ANOVA table states p-values based on null-distributionsderived from the model and εijk i .i .d . ∼ N (0, σ2).Under the additive model

⇒ SSE ∼ σ2χ2dfE and E(MSE ) = E(SSE/dfE ) = σ2

SSA ∼ σ2χ2t1−1,λ1 with ncp

λ1 =∑ijk

(µ i. − µ..)2/σ2 =∑ijk

a2i /σ2 = n t2

∑i

a2i /σ2

E(MSA) = E

(SSA

t1 − 1

)= σ2 + σ2

λ1

t1 − 1= σ2 +

nt2

t1 − 1

∑i

a2i

SSB ∼ σ2χ2t2−1,λ2 with ncp

λ2 =∑ijk

(µ.j − µ..)2/σ2 =∑ijk

b2j /σ2 = n t1

∑j

b2j /σ2

E(MSB) = E

(SSB

t2 − 1

)= σ2 + σ2

λ2

t2 − 1= σ2 +

nt1

t2 − 1

∑j

b2j

nt2 and nt1 act as multipliers in the noncentrality parameters!Looking at both factors jointly, we bene�t from the commonσ2 assumption.

52

Distributional Facts (continued)

SSA, SSB and SSE are independent due to orthogonality ofcomponent vectors.

Under HA : a1 = . . . = at1 = 0 we have λ1 = 0 and thusFA = MSA/MSE ∼ Ft1−1,dfE .

Under HB : b1 = . . . = bt2 = 0 we have λ2 = 0 and thusFB = MSB/MSE ∼ Ft2−1,dfE .

These F -distributions are the basis for the p-values in theprevious ANOVA table.

These p-values correspond to testing HA and HB , respectively.

53

How Well Does the Additive Model Fit?

In the additive model we have: µij = µ+ ai + bjCould compare the full model estimate of µij , namely theaverage Yij. over the n observations in cell (i , j), with theadditive model �tted value for that same cell, i.e., with

µij = µ+ ai + bj = Y... + (Yi.. − Y...) + (Y.j. − Y...)= Yi.. + Y.j. − Y... .

Yij. depends only on data from cell (i , j), averaging n values.

µij depends on data from cells other than cell (i , j) and ismore strongly averaged.

54

Full Model Representation

The full model Yijk = µij + εijk can be written in the followingequivalent form:

Yijk = µ+ ai + bj + (ab)ij + εijk = µ+ ai + bj + cij + εijk

Here it is assumed that εijki.i.d.∼ N (0, σ2).

The equivalent full model form decomposes the meanstructure in µij into two components, namely the previouslyconsidered additive model µ+ ai + bj and the extentcij = (ab)ij to which this additive model does not explain µij ,i.e., cij = µij − (µ+ ai + bj).

These parameters cij = (ab)ij are called interaction terms.

The use of the notational device (ab) is just a mnemonic toindicate the inseparable or joint action of the factors A and B ,i.e., their interaction.

55

Identi�ability Issues

While there are t1t2 mean parameters µij , the alternateparametrization has 1 + t1 + t2 + t1t2 parameters

µ, a1, . . . , at1 , b1, . . . , bt2 , c11, . . . , ct1t2

To make these latter parameters identi�able we need toimpose again certain side conditions.

There are two customary ways which parallel the previousidenti�ability resolution in the case of the additive model.

1) a1 = b1 = c1j = ci1 = 0 ∀i , jset-to-zero side condition (lm output in R)

2)∑

i ai =∑

j bj =∑

i cij =∑

j cij = 0 ∀i , jsum-to-zero side condition

56

µij =⇒ Sum-to-Zero Parametrization

De�ne

µ = µ.. =1

t1t2

∑i

∑j

µij , µi. =1

t2

∑j

µij , µ.j =1

t1

∑i

µij

and then all parameters µ, ai , bj , cij are determined from the µij via

ai = µi. − µ.., bj = µ.j − µ..

cij = µij − µi. − µ.j + µ..

= µij − (µi. − µ..)− (µ.j − µ..)− µ.. = µij − ai − bj − µ

=⇒ µij = µ+ai+bj+cij with∑i

ai =∑j

bj =∑i

cij =∑j

cij = 0

satisfying the sum-to-zero side conditions.57

µij =⇒ Set-to-Zero Parametrization

De�ne all parameters µ?, a?i , b?j , c

?ij from the µij via

µ? = µ11, a?i = µi1−µ11 = µi1−µ?, b?j = µ1j−µ11 = µ1j−µ?,

c?ij = µij − µi1 − µ1j + µ11

= µij − (µi1 − µ11)− (µ1j − µ11)− µ11

= µij − a?i − b?j − µ? .

=⇒ µij = µ?+a?i +b?j +c?ij with a?1 = 0, b?1 = 0, c?i1 = c?1j = 0 ∀i , j

satisfying the set-to-zero side conditions.

Whatever the parametrization, we can easily go from one to theother via the previous de�nitions in terms of the µij .

58

Decomposition and Least Squares Estimation

Extend decomposition as follows (with orthogonal components)

Yijk = Y... + (Yi.. − Y...) + (Y.j. − Y...)+(Yij. − Yi.. − Y.j. + Y...) + (Yijk − Yij.)

= µ+ ai + bj + cij + εijk∑ijk

(Yijk−µ−ai−bj−cij)2

=∑ijk

[(Y... − µ) + (Yi.. − Y... − ai ) + (Y.j. − Y... − bj)

+(Yij. − Yi.. − Y.j. + Y... − cij) + (Yijk − Yij.)]2

=∑ijk

[(Y... − µ)2 + (Yi.. − Y... − ai )

2 + (Y.j. − Y... − bj)2

+ (Yij. − Yi.. − Y.j. + Y... − cij)2 + (Yijk − Yij.)2

]59

Full Model Least Squares Estimates (LSEs)

Cross product terms = 0 in the previous quadratic expansion,because of the component orthogonality in the decomposition.

The least squares estimates (LSEs, in sum-to-zero form) are

µ = Y... , ai = Yi.. − Y... , bj = Y.j. − Y...cij = Yij. − Yi.. − Y.j. + Y...

with E (µ) = µ, E (ai ) = ai , E (bj) = bj and E (cij) = cij ,

The LSEs are unbiased.

The �tted values for the µij are

ˆµij = µ+ ai + bj + cij = µij + cij

= Y... + (Yi.. − Y...) + (Y.j. − Y...)+(Yij. − Yi.. − Y.j. + Y...)

= Yij. i.e., the cell means.

with residuals εijk = Yijk − Yij..60

Sum of Squares Decomposition

Using µ = ai = bj = cij = 0 in previous decomposition∑ijk

Y 2

ijk =∑ijk

[Y 2... + (Yi.. − Y...)2 + (Y.j. − Y...)2

+(Yij. − Yi.. − Y.j. + Y...)2 + (Yijk − Yij.)2]

∑ijk

(Yijk − Y...)2 =∑ijk

[(Yi.. − Y...)2 + (Y.j. − Y...)2

+ (Yij. − Yi.. − Y.j. + Y...)2 + (Yijk − Yij.)2]

SST = SSA + SSB + SSAB + SSE

SST =∑ijk

(Yijk − Y...)2

SSA =∑ijk

(Yi.. − Y...)2 =∑ijk

a2i , SSB =∑ijk

(Y.j. − Y...)2 =∑ijk

b2j

SSAB =∑ijk

(Yij. − Yi.. − Y.j. + Y...)2 =∑ijk

c2ij , SSE =∑ijk

(Yijk − Yij.)2 =∑ijk

ε2ijk

61

ANOVA Table for the Full Model

Source SS df MS F

A SSA t1 − 1 MSA = SSA/(t1 − 1) MSA/MSE

B SSB t2 − 1 MSB = SSB/(t2 − 1) MSB/MSE

AB SSAB (t1 − 1)(t2 − 1) MSAB = SSAB/[(t1 − 1)(t2 − 1)] MSAB/MSE

Error SSE t1t2(n − 1) MSE = SSE/[t1t2(n − 1)]


62

Distributional Facts for the Full Model

SSAσ2∼ χ2t1−1, λA , λA =

∑ijk a

2i

σ2,

SSBσ2∼ χ2t2−1, λB , λB =

∑ijk b

2j

σ2

SSABσ2

∼ χ2(t1−1)(t2−1), λAB, λAB =

∑ijk c

2ij

σ2,

SSEσ2∼ χ2t1t2(n−1)

SSA, SSB , SSAB , SSE are statistically independent (orthogonality).

FA = MSA/MSE ∼ Ft1−1, t1t2(n−1), λA

FB = MSB/MSE ∼ Ft2−1, t1t2(n−1), λB

FAB = MSAB/MSE ∼ F(t1−1)(t2−1), t1t2(n−1), λAB

63

Expected MS for the Full Model

E (MSA) = E

(SSAt1 − 1

)= σ2 +

∑ijk a

2i

t1 − 1= σ2

(1 +

λAt1 − 1

)

E (MSB) = E

(SSBt2 − 1

)= σ2 +

∑ijk b

2j

t2 − 1= σ2

(1 +

λBt2 − 1

)

E (MSAB) = E

(SSAB

(t1 − 1)(t2 − 1)

)= σ2 +

∑ijk c

2ij

(t1 − 1)(t2 − 1)

= σ2(1 +

λAB(t1 − 1)(t2 − 1)

)

E (MSE ) = σ2.

64

F -Tests for Full Model

Reject H0A : a1 = . . . = at1 = 0 whenever FA is too large.

For a level α test reject H0A wheneverFA ≥ qf(1− α,t1 − 1,t1t2(n− 1)).

Reject H0B : b1 = . . . = bt2 = 0 whenever FB is too large.

For a level α test reject H0B wheneverFB ≥ qf(1− α,t2 − 1,t1t2(n− 1)).

Reject H0AB : cij = 0 ∀i , j whenever FAB is too large.

For a level α test reject H0AB wheneverFAB ≥ qf(1− α, (t1 − 1)(t2 − 1),t1t2(n− 1)).

H0AB : cij = 0 ∀i , j means that the additive modelµij = µ+ ai + bj is su�cient to explain the mean structure.

Rejecting H0AB =⇒ the additive model will not provide asu�cient explanation.

65

Comments on the Full Model ANOVA Table

SSadditive modelE =

∑ijk

(Yijk−Yi..−Y.j.+Y...)2 6= SS full modelE =

∑ijk

(Yijk−Yij.)2

∑ijk

(Yijk−Yi..−Y.j.+Y...)2 =∑ijk

(Yijk−Yij.)2+∑ijk

(Yij.−Yi..−Y.j.+Y...)2

MSadditive modelE =

∑ijk(Yijk − Yi.. − Y.j. + Y...)2

(t1 − 1)(t2 − 1) + t1t2(n − 1)

6= MS full modelE =

∑ijk(Yijk − Yij.)2t1t2(n − 1)

F additive modelA = MSA/MSadditive model

E

6= F full modelA = MSA/MS full model

E

F additive modelB = MSB/MSadditive model

E

6= F full modelB = MSB/MS full model

E

66

Reciprocal Time to Death (Insecticide Data)

> round(recip.time,5)[1] 0.32258 0.22222 0.21739 0.23256 0.27778 0.34483 0.25000[8] 0.43478 0.45455 0.47619 0.55556 0.43478 0.12195 0.09091

[15] 0.11364 0.13889 0.10870 0.16393 0.20408 0.08065 0.33333[22] 0.27027 0.26316 0.34483 0.23256 0.22222 0.15873 0.13158[29] 0.22727 0.28571 0.32258 0.25000 0.43478 0.40000 0.41667[36] 0.45455 0.22222 0.14085 0.15152 0.16129 0.17857 0.09804[43] 0.14085 0.26316 0.33333 0.27778 0.32258 0.30303

67

Factors of Insecticide Data

> type[1] I I I I II II II II III III III III

[13] I I I I II II II II III III III III[25] I I I I II II II II III III III III[37] I I I I II II II II III III III IIILevels: I II III

> delivery[1] A A A A A A A A A A A A B B B B B B B B B B B B

[25] C C C C C C C C C C C C D D D D D D D D D D D DLevels: A B C D

Both type and delivery are in factor form.

Don't need to invoke as.factor(type) andas.factor(delivery) in the call of lm.

68

out.lmFULL <- lm(recip.time ∼ type*delivery)

recip.time ∼ type*delivery

compare←→ recip.time ∼ type+delivery

> out.lmFULLCall:lm(formula = recip.time ~ type * delivery)

Coefficients:(Intercept) typeII typeIII

0.248688 0.078159 0.231580deliveryB deliveryC deliveryD-0.132342 -0.062416 -0.079720

typeII:deliveryB typeIII:deliveryB typeII:deliveryC-0.055166 -0.045030 0.006961

typeIII:deliveryC typeII:deliveryD typeIII:deliveryD0.008646 -0.076974 -0.091368

Note the set-to-zero form of parameter estimates.

69

What else is in out.lmFULL?

> names(out.lmFULL)[1] "coefficients" "residuals" "effects" "rank"[5] "fitted.values" "assign" "qr" "df.residual"[9] "contrasts" "xlevels" "call" "terms"

[13] "model"

70

Full ANOVA: anova(out.lmFULL)



type 2 0.34877 0.17439 72.6347 2.310e-13 ***delivery 3 0.20414 0.06805 28.3431 1.376e-09 ***type:delivery 6 0.01571 0.00262 1.0904 0.3867Residuals 36 0.08643 0.00240---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Thus it appears that the additive model is quite acceptable.

Both factors play strongly in the additive model.

71

Graphical View of No Interaction E�ect

If one view looks parallel so will the other.

●

●

●●

●

●

●

●

●

●

●

●

0.0

0.1

0.2

0.3

0.4

0.5

0.6

A B C D

●

●

●

IIIIII

●

●

●

●●

●

●

●

●

● ●

●

0.0

0.1

0.2

0.3

0.4

0.5

0.6

I II III

●

●

●

●

ABCD

72

If One View Looks Parallel so Will the Other.

µ11 µ12 µ13 µ14µ11 + d2 µ12 + d2 µ13 + d2 µ14 + d2µ11 + d3 µ12 + d3 µ13 + d3 µ14 + d3

Row di�erences are constant!

=⇒ Column di�erences are constant as well, i.e.,∆2 = µ12 − µ11 is the di�erence between column 2 and column 1∆3 = µ13 − µ11 is the di�erence between column 3 and column 1∆4 = µ14 − µ11 is the di�erence between column 4 and column 1.

µ11 µ11 + ∆2 µ11 + ∆3 µ11 + ∆4

µ21 µ21 + ∆2 µ21 + ∆3 µ21 + ∆4

µ31 µ31 + ∆2 µ31 + ∆3 µ31 + ∆4

Used µ11 = µ11, µ21 = µ11 + d2, and µ31 = µ11 + d3.

Similarly one argues going the other direction.

73

Examining Factor Level Di�erences

Strong evidence of factor level di�erences.

Examine them individually to see which di�erences matter.

A naive approach: Perform a 2-sample t-test.

Or look at the corresponding con�dence intervals.

Comparing type I with type II means we gett.test(recip.time[type == "I"],recip.time[type == "II"],var.equal=T)

Two Sample t-test

data: recip.time[type == "I"] and recip.time[type == "II"]t = -1.6383, df = 30, p-value = 0.1118alternative hypothesis: true difference in means

is not equal to 095 percent confidence interval:-0.10528550 0.01155725sample estimates:mean of x mean of y0.1800688 0.2269329

74

Comparing Types I and II of Insecticide

reciprocal time to death (1/minute)

● ●● ●● ● ●

● ● ● ●●●●

●●

●●● ●●● ● ● ●●

●● ●● ● ●I

II

type

of p

oiso

n

0.0 0.1 0.2 0.3 0.4 0.5

points are jittered vertically and horizontally

75

Getting MSE in Types I and II Comparison

> anova(lm(recip.time[type=="I" | type=="II"]~+ type[type=="I" | type=="II"]))Analysis of Variance Table

Response: recip.time[type == "I" | type == "II"]Df Sum Sq Mean Sq F value Pr(>F)

type[type == "I" | type == "II"] 1 0.017570 0.017570 2.6839 0.1118Residuals 30 0.196394 0.006546

Note the same p-value 0.1118 as in previous t-test.

We are doing the same test, since t2f = F1,f .

From this table we get s =√MSE =

√0.006546 = .08091.

This could also have been backed out from previous t-basedcon�dence interval.

type == "I" | type == "II" = logical OR.

76

What is Wrong?

In the previous 2-sample t-test/interval we treated theobservations as i.i.d. from two populations.

Ignored the known variations due to the delivery method.

The pooled sample standard deviation from these 2 �samples�confounds variation between delivery method means withvariation (σ) of within (delivery,type) combination.

Our �reference distribution� will thus be too dispersed.

Our test will be less discriminating.

Our con�dence interval will be too wide.

77

Closer Look in Comparing Types I and II

reciprocal time to death (1/minute)

0.0 0.1 0.2 0.3 0.4 0.5

I

II

type

of p

oiso

n

ABCD

delivery method

Note the reduced variability within same color clusters.

78

Correct Approach

According to our (accepted) additive model we have

Y1jk = µ+ a1 + bj + ε1jk and

Y2jk = µ+ a2 + bj + ε2jk with εijki.i.d.∼ N (0, σ2)

The di�erence due to type I and type II is captured by a1 − a2.

This can be interpreted as the di�erence between the meanresponse under type I and the mean response under type II:

µ1. − µ2. =

∑j µ1j

4−∑

j µ2j

4

=

∑j(µ+ a1 + bj)−

∑j(µ+ a2 + bj)

4= a1 − a2

The e�ect of type = I vs. type = II can be interpreted as acontrast in cell means.

Think of it as contrasting the e�ects of interest whilecanceling/averaging out other e�ects.

79

Contrast in Full Model

Even in the full model with interactions the previous contrast inmeans stays the same since

µ1. − µ2. =

∑j µ1j

4−∑

j µ2j

4

=

∑j(µ+ a1 + bj + c1j)−

∑j(µ+ a2 + bj + c2j)

4

= µ+ a1 +

∑j(bj + c1j)

4− µ− a2 −

∑j(bj + c2j)

4

= a1 − a2

µ, bj , c1j and c2j are canceled out.

80

Estimated Contrast

The natural estimate of µ1. − µ2. = a1 − a2 is

µ1.−µ2. = a1−a2 = (Y1..−Y...)−(Y2 ..−Y...) = Y1..−Y2 ..

Same as in previous (naive) 2-sample t-test/interval.Can also be viewed as the contrast of estimated cell averages

a1 − a2 = Y1.. − Y2 ..

=Y11. + Y12. + Y13. + Y14.

4− Y21. + Y22. + Y23. + Y24.

4

=ˆµ11 + ˆµ12 + ˆµ13 + ˆµ14

4−

ˆµ21 + ˆµ22 + ˆµ23 + ˆµ244

=µ11 + µ12 + µ13 + µ14

4− µ21 + µ22 + µ23 + µ24

4

because ci1 + ci2 + ci3 + ci4 = 0 in∑

jˆµij =

∑j(µij + cij).

81

Correct Approach (continued)

However, var(Y1..−Y2..) = var(Y1..)+var(Y2..) =σ2

nt2+σ2

nt2=

2σ2

nt2

(Y1.. − Y2.. − (a1 − a2))/(σ√

2nt2

)s/σ

=(Y1.. − Y2.. − (a1 − a2))

s√2/(nt2)

=a1 − a2 − (a1 − a2)

s√2/(nt2)

∼ tf .

We have two options in choosing s and the corresponding f :

s2 = s2× = MS full modelE =

∑ijk(Yijk − Yij.)2t1t2(n − 1)

and thus f = t1t2(n − 1) .

or s2 = s2+ = MSadditive modelE =

∑ijk(Yijk − Yi.. − Y.j. + Y...)2

t1t2(n − 1) + (t1 − 1)(t2 − 1)

and thus f = t1t2(n − 1) + (t1 − 1)(t2 − 1) = t1t2n − t1 − t2 + 1.

82

t-Test for H0 : a1 = a2

Reject H0 : a1 = a2 when∣∣∣∣∣ a1 − a2√2MSE/(nt2)

∣∣∣∣∣ > t1−α/2,f

or |a1 − a2| >√2MSE/(nt2)t1−α/2,f

or |a1 − a2| > SE (a1 − a2)t1−α/2,f = LSDA ,

LSDA = least signi�cant di�erence with which any estimateddi�erences ai − ai ′ (i 6= i ′) in factor A levels can be compared.

Parallel with our previous use of LSD in the ANOVA situation.

83

t-Test for H0 : b1 = b2

Reject H0 : b1 = b2 when∣∣∣∣∣ b1 − b2√2MSE/(nt1)

∣∣∣∣∣ > t1−α/2,f

or |b1 − b2| >√2MSE/(nt1)t1−α/2,f

or |b1 − b2| > SE (b1 − b2)t1−α/2,f = LSDB ,

LSDB = least signi�cant di�erence with which any estimateddi�erences bj − bj ′ (j 6= j ′) in factor B levels can be compared.

Note the change (t2 ←→ t1) between LSDA and LSDB .

84

Factor Level Means

> mean(recip.time[type=="I"])[1] 0.1800688> mean(recip.time[type=="II"])[1] 0.2269329> mean(recip.time[type=="III"])[1] 0.3797112

> mean(recip.time[delivery=="A"])[1] 0.3519345> mean(recip.time[delivery=="B"])[1] 0.1861943> mean(recip.time[delivery=="C"])[1] 0.294721> mean(recip.time[delivery=="D"])[1] 0.2161007

85

LSDA and LSDB

For the Type Factor

LSDA = t.975,f SE (a1 − a2) = 2.028094√.00240 2

4·4 = 0.03513

for the full model with f = 36.

LSDA = t.975,f SE (a1 − a2) = 2.018082√.00243 2

4·4 = 0.03517

for the additive model with f = 42.

For the Delivery Factor

LSDB = t.975,f SE (b1 − b2) = 2.028094√.00240 2

3·4 = 0.04056

for the full model with f = 36.

LSDB = t.975,f SE (b1 − b2) = 2.018082√.00243 2

3·4 = 0.04061

for the additive model with f = 42.

86

LSD Groupings

Type

LSDA = .0352

Poison Mean LSDType µi. Grouping

I 0.180 1

II 0.227 2

III 0.380 3

Note di�erenceto naive 2-sample t-test.

Delivery

LSDB = .0406

Poison Mean LSDDelivery µ.j Grouping

B 0.186 1

D 0.216 1

C 0.295 2

A 0.352 3

87

Looking Back

0.2

0.3

0.4

type

mea

n of

rec

ip.ti

me

I II III

delivery

ACDB

0.2

0.3

0.4

delivery

mea

n of

rec

ip.ti

me

A B C D

type

IIIIII

88

interaction.plot

The plots on the previous slide were produced by:

> par(mfrow=c(2,1),mar=c(4,4,1,1)+.1)> interaction.plot(type,delivery,recip.time,

col=c("green","red","blue","cyan"))> interaction.plot(delivery,type,recip.time,

col=c("green","red","blue"))

Here mar=c(4,4,1,1)+.1 inside par sets plot margins.

mfrow=c(2,1) sets up plotting for two plots per page,one above the other.

89

The Full Model Revisited

The full model is Yijk = µij + εijk .

We can reparametrize it as

Yijk = µ+ai+bj+cij+εijk with replications k = 1, . . . , n, where

1) µ =∑

ij µij/(t1t2) = µ.. is the grand mean.2) ai =

∑j(µij − µ..)/t2 = µi. − µ.. with

∑i ai = 0.

3) bj =∑

i (µij − µ..)/t2 = µ.j − µ.. with∑

j bj = 0.

ai and bj are also referred to as main e�ects.4) cij = µij − µi. − µ.j + µ.. = µij − (µ+ ai + bj) with∑

i cij =∑

j cij = 0.

In the additive model the only change is 4) cij = 0 ∀ i , j .

90

The Interpretation of ai − ai ′

In the additive or main e�ects model we have

ai−ai ′ = µij−µi ′j = µ+ai +bj−(µ+ai ′+bj) ∀j = 1, . . . , t2 ,

ai − ai ′ = di�erence in mean response between levels i and i ′

of factor A, and it is the same for each level j of factor B .

In the full or interaction model (since b. = ci. = ci ′. = 0)

ai−ai ′ = µi.−µi ′. = µ+ai+b.+ci.−(µ+ai ′+b.+ci ′.) ∀j = 1, . . . , t2 ,

ai − ai ′ = di�erence in mean response between levels i and i ′

of factor A, when averaged over all levels j of factor B .

Corresponding interpretations hold for bj − bj ′ .

91

Understanding Interactions

Main e�ects ai and bj are called that way because theiradditive e�ects on the mean µij are easily understood.

Interactions can be more complicated and may need morescrutiny in order to develop some understanding.

We will just give a few example situations that illustrate somedistinct and very di�erent situations.

There are certainly many other possibilities.

For simplicity we consider a 2× 4 two-factor experiment.

92

Interaction Pattern 1

●

●

A1.B1 A2.B1 A1.B2 A2.B2 A1.B3 A2.B3 A1.B4 A2.B4

24

68

93



24

68

94


●

●

●


24

68

95

Comments on Interaction Patterns

Pattern 1 shows a linear interaction trend along the levels offactor B .

Factor B seems to have no additive or main e�ect.

Pattern 2 seems to have no additive e�ect from factor Band almost no additive e�ect from factor A.

Only for B4 of factor B is there a clear change in factor A.

Hence B4 acts as an interaction switch.

Pattern 3 shows no additive or main e�ect due to factor B .

If the colors of the last two boxes were switched we wouldhave a clear additive or main e�ect due to factor A (±2.5).This would give a far simpler data explanation and it suggeststhe possibility of a labeling error.

96

Randomized Complete Block Designs

When we did our ANOVA to examine the e�ects of a factor Aof interest, we saw that the power of the F -test is anincreasing function of the noncentrality parameterλ =

∑i ni (µi − µ)2/σ2.

Cannot in�uence the size of treatment e�ects, |µi − µ|,We can in�uence the sample sizes ni and possibly σ.

Note σ −→ σ/2 ⇐⇒ ni −→ 4 · ni !!

How can we in�uence σ itself?

We need to understand what may a�ect σ.

Often σ is due to variability of hidden or ignored factors.

Consider the delivery factor in our insecticide experiment.

It had a de�nite e�ect on the measured response times.

Had we ignored it or left it to happenstance which delivery wasused for each experimental unit (insect), we would haveconfounded the variability due to delivery with the remainingvariability within (delivery,type).

97

Ignoring a Factor

Assuming an additive model (similar reasoning under fullmodel):

Source SS df MS F

A SSA t1 − 1 SSA/(t1 − 1) MSA/MSE

B SSB t2 − 1 SSB/(t2 − 1) MSB/MSE

Error SSE dfE SSE/dfE


If we ignored factor B , we would treat SS ′E = SSB + SSE asour error sum of squares with df

′E = dfE + t2 − 1.

SS ′E/df′E = MS ′E would be a legitimate estimate of σ2

if b1 = . . . = bt2 = 0.

If not, MS ′E would be in�ated,

it would estimate σ′2 > σ2 ⇒ loss of power.

98

Blocking

Blocking consists of stratifying experimental units into groupsthat will have more homogeneous responses within groups.

Possibly quite inhomogeneous responses between groups.

Such grouping/blocking can be accomplished by anappropriately chosen factor, where the levels of that factorde�ne the di�erent groups.

Blocking will be bene�cial if the factor used for blocking causesvariation in the response as the levels of that factor change.

Bene�cial in the sense that this variation can be eliminated.

Thus experimental units within a level of that factor (i.e.,within a block) will not experience that change and will thusbe more homogeneous in their response.

Treatment comparisons within blocks, block by block.

99

Typical Blocking Criteria

Location: Experiments are conducted over varying locationsand location is judged to have an e�ect on the response.

Time: If time of day, month, or year are likely to a�ectresponse. Results of the study are to stand regardless of time.

Litters: If animals in the same litter are likely to produce morehomogeneous responses. Useful in medical experiments.

Batches of Material: Variations in the process for creatingexperimental material are likely to show up in the responses.

Any variation inducing aspect of an experiment (education,income, . . .) that is not considered a treatment of interest.

100

Sir Ronald Aylmer Fisher

The statistician Sir Ronald Aylmer Fisher(1890-1962)developed experimental design in an agricultural setting atRothamstead Experimental Station.

See http://www.bookrags.com/Ronald_Fisher onhis great in�uence as one of the founding fathers of statistics.

To some he is more famous for his work in Genetics.

�He bred poultry, mice, snails, and other creatures andpublished his �ndings in several papers that contributed toscientists' overall understanding of genetic dominance.�

Apparently he had much to do with the fact that thesigni�cance level of .05 is so entrenched until today, see:

http://www.jerrydallal.com/LHSP/p05.htm

Thus it is only �tting to consider an agricultural experiment toillustrate blocking.

101

Nitrogen Fertilizer Timing

A nitrogen fertilizer can be administered according to 6di�erent timing schedules (treatments).

The response is the nitrogen uptake (ppm×10−2).The experimental material: One irrigated �eld.

Subdividing the �eld into di�erent experimental units for usewith di�erent treatments could be a�ected by soil moisturevariation, caused by a sloping �eld gradient.

This moisture gradient is mainly across the width of the �eld.

This suggest rows along the �eld length as blocks.

102

Field Moisture Gradient

wet

dry

soil

wet

ness

103

The Data

Field subdivided into 4 rows (or blocks) with 6 plots each.

The 6 treatments were randomly assigned to each row.

Remaining moisture variability within rows a�ects MSE .

rowtreatment

response

12

40.895

37.994

37.181

34.986

34.893

42.07

21

41.223

49.424

45.856

50.155

41.992

46.69

36

44.573

52.685

37.611

36.942

46.654

40.23

42

41.904

39.206

43.295

40.453

42.911

39.97

Also available as fertilizerdata.csv on web site.

104

Randomized Complete Block (RCB) Design

Experimental units are blocked into presumably morehomogeneous groups.

The blocks are complete, i.e., each treatment in each block.

The blocks are balanced

t1 = 4 observations for each treatment level.t2 = 6 treatments for each block level (row).n = 1 observation per (block,treatment) combination.

105

Nitrogen Fertilizer Box Plots

1 2 3 4 5 6

3540

4550

treatment

nitr

ogen

upt

ake

(ppm

×

10−2

)

1 2 3 4

3540

4550

row

nitr

ogen

upt

ake

(ppm

×

10−2

)

106

ANOVA Table

Source SS df MS F

Block SSBlock t1 − 1 SSBlock/(t1 − 1) MSBlock/MSE

Treatment SSTreat t2 − 1 SSTreat/(t2 − 1) MSTreat/MSE

Error SSE dfE SSE/dfE


Note that here n = 1, thus t1t2n − 1 = t1t2 − 1, anddfE = (t1 − 1)(t2 − 1) + t1t2(n − 1) = (t1 − 1)(t2 − 1).

The table is the same as in the additive 2-factor ANOVA.

Just relabel factors A and B to Block and Treatment.

107

Nitrogen Fertilizer Results

> anova(lm(response~as.factor(row)+as.factor(treatment),data=fertilizerdata))


Response: responseDf Sum Sq Mean Sq F value Pr(>F)

as.factor(row) 3 197.004 65.668 9.1198 0.001116 **as.factor(treatment) 5 201.316 40.263 5.5917 0.004191 **Residuals 15 108.008 7.201---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Both treatment and blocking factor are signi�cant at .005.

108

Nitrogen Fertilizer Results (Ignoring the Blocking)

> anova(lm(response~as.factor(treatment),data=fertilizerdata))


Response: responseDf Sum Sq Mean Sq F value Pr(>F)

as.factor(treatment) 5 201.316 40.263 2.3761 0.08024 .Residuals 18 305.012 16.945---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Not signi�cant at .05.

A�ected by extra variation induced by wetness gradient.

Less discrimination power to see the fertilizer timing e�ect.

109

F -Statistics in the Last Two ANOVAs

Compare the treatment F -statistics for the last two ANOVAs.

F(1)Treat =

∑ij(Y.j − Y..)2/5∑

ij(Yij − Y.j − [Yi. − Y..])2/15

and F(2)Treat =

∑ij(Y.j − Y..)2/5∑ij(Yij − Y.j)2/18

In F(1)Treat the variation of the Yij around Y.j is corrected for the

variation due to Yi. − Y.., i.e., the row variation.

In F(2)Treat such a correction is omitted, i.e., the row variation is

ignored and absorbed as part of the MSE :∑ij

(Yij−Y.j)2 =∑ij

(Yij−Y.j−[Yi.−Y..])2+∑ij

(Yi. − Y..)2

110

The Last Identity

The previous decomposition could use some elaboration:

∑ij

(Yij − Y.j − [Yi. − Y..])2 =∑ij

(Yij − Y.j )2 +∑ij

(Yi. − Y..)2

−2∑ij

(Yij − Y.j )(Yi. − Y..)

=∑ij

(Yij − Y.j )2 +∑ij

(Yi. − Y..)2

−2∑ij

(Yi. − Y..)2

=∑ij

(Yij − Y.j )2 −∑ij

(Yi. − Y..)2

=⇒∑ij

(Yij − Y.j )2 =∑ij

(Yij − Y.j − [Yi. − Y..])2 +∑ij

(Yi. − Y..)2

111

A Look at the Block Adjustment

Model: Yij = µ+ bi + τj + εij

Estimated block e�ect bi = Yi. − Y..Consider the block adjusted observationsZij = Yij − bi = Yij − (Yi. − Y..)These Zij , when plotted on separate levels for each blockidenti�er i , will look more aligned, more like �replicates.�Thus it makes sense to look at the average Z.j − Z.. overthose blocks as estimate for the j th treatment e�ect.

Z.j − Z.. = Y.j − 0− (Y.. − 0) = Y.j − Y..Zij − Z.j = Yij − (Yi. − Y..)− (Y.j − 0) = Yij − Yi. − Y.j + Y..Compare the treatment e�ect dispersion∑

ij(Z.j − Z..)2 =∑

ij(Y.j − Y..)2against the within �replication� dispersion:∑

ij(Zij − Z.j)2 =∑

ij(Yij − Yi. − Y.j + Y..)2which leads back to our F

(1)Treat

.

112

Visual Block Adjustment

8 9 10 11 12

●

block 1

●

block 2

●block 3

●block 4

response

9.6 9.8 10.0 10.2 10.4

●

block 1

●

block 2

●block 3

●block 4

response adjusted for block effect

113

Comments on the Visual Block Adjustments

Clear block e�ect.

Compare variability between and within blocks.

After the blocks are shifted to remove the individual blocke�ects the data sets appear aligned on top of each other.

Pattern relationship within each block is undisturbed.

We are just using a magni�ed scale.

Easy to see di�erences between symbol groups (treatments)compared to variation within symbol groups.

Not easy to see without prior alignment.

114

Randomization Tests Revisited

Assign 6 treatments randomly within each row.The number of possible assignments is

(6!)4 = 7204 = 268, 738, 560, 000 ≈ 2.7 · 1011

This is roughly 1/10, 000 the number of patterns assigning 6treatments in groups of 4 without row blocking restriction(

24

4

)(20

4

)(16

4

)(12

4

)(8

4

)(4

4

)= 3.246671 · 1015

If the treatment e�ects were identical, then the treatmentassignment would not have a�ected the responses.

Any other of the 2.7 · 1011 treatment assignments would havegiven us the same results Yij .Test H0 : τ1 = . . . = τ6(= 0) using as test statistics FTreat andits resulting randomization reference distribution

FTreat =

∑ij(Y.j − Y..)2/(6− 1)∑

ij(Yij − Yi. − Y.j + Y..)2/((6− 1)× (4− 1))

115

A Large Value of FTreat

According to the premise of no treatment e�ect, whencalculating FTreat for all these 2.7 · 1011 treatmentassignments, we get 2.7 · 1011 equally likely values of FTreat ,i.e., the reference distribution of FTreat .

This distribution by theory ≈ F5,15-distribution.

F obs

Treat , would then be like one of these equally likely values.

If F obs

Treat is in the far upper tail of the reference distribution,judge the result signi�cant evidence against H0.

Strength of conviction ⇔ p-value = P(FTreat ≥ F obs

Treat)

116

Randomization Test for Treatment E�ect

Randomization Distribution (F−test for Treatment Effect)

randomization F−statistic

Den

sity

0 2 4 6 8 10

0.0

0.2

0.4

0.6

p−va

lue

= 0

.001

27

with F5 , 15 −density superimposed

Nsim = 1e+05

117

Comparing Results

Randomization reference distribution=⇒ p-value = .00127 = 127/100000.

Normal theory test for the same hypothesis and using the sametest statistic came up with a p-value = .004191 from F5,15.

Question: Is this explainable by statistical variation alone?

Such statistical variation would come from the fact that weapproximated the true randomization reference distribution bysimulation from Nsim = 100, 000 treatment allocations.

A 95% upper con�dence bound for the p-value resulting fromthe full true reference distribution can be obtained from R viaqgamma(.95,127+1)/100000=0.001471603 or viaqbeta(.95,127+1,100000-127)=0.001471455.

That still leaves a wide gap to .004191.

F5,15 approximation is not good enough out in the far tail.

The superimposed F -density looks mostly like a good �t.

118

Unbalanced Designs

Deliberately undo the balanced design in poison/delivery data.

> recip.time0 <- recip.time[c(-1,-5,-15)]> delivery0 <- delivery[c(-1,-5,-15)]> type0 <- type[c(-1,-5,-15)]

> table(type,delivery)delivery

type A B C DI 4 4 4 4II 4 4 4 4III 4 4 4 4

> table(type0,delivery0)delivery0

type0 A B C DI 3 3 4 4II 3 4 4 4III 4 4 4 4

119

ANOVA in Unbalanced Designs

delivery0type0 <- lm(recip.time0 ~ delivery0*type0)> coef(delivery0type0)

(Intercept) delivery0B delivery0C delivery0D0.22405722 -0.10680749 -0.03778486 -0.05508903

type0II type0III delivery0B:type0II delivery0C:type0II0.11914618 0.25621130 -0.09705669 -0.03402663

delivery0D:type0II delivery0B:type0III delivery0C:type0III delivery0D:type0III-0.11796097 -0.07056376 -0.01598499 -0.11599898

> anova(delivery0type0)


Response: recip.time0Df Sum Sq Mean Sq F value Pr(>F)

delivery0 3 0.19124 0.063745 27.7042 3.873e-09 ***type0 2 0.34039 0.170193 73.9673 6.403e-13 ***delivery0:type0 6 0.02142 0.003570 1.5514 0.1926Residuals 33 0.07593 0.002301

120

ANOVA in Unbalanced Design (order reversed)

> type0delivery0 <- lm(recip.time0 ~ type0*delivery0)> coef(type0delivery0)

(Intercept) type0II type0III delivery0B0.22405722 0.11914618 0.25621130 -0.10680749delivery0C delivery0D type0II:delivery0B type0III:delivery0B

-0.03778486 -0.05508903 -0.09705669 -0.07056376type0II:delivery0C type0III:delivery0C type0II:delivery0D type0III:delivery0D

-0.03402663 -0.01598499 -0.11796097 -0.11599898

> anova(type0delivery0)


Response: recip.time0Df Sum Sq Mean Sq F value Pr(>F)

type0 2 0.35058 0.175291 76.1830 4.295e-13 ***delivery0 3 0.18104 0.060347 26.2271 7.307e-09 ***type0:delivery0 6 0.02142 0.003570 1.5514 0.1926Residuals 33 0.07593 0.002301

121

Comments on ANOVA in Unbalanced Designs

The type and delivery data subspaces are no longerorthogonal.

Each x ∈ delivery subspace can be decomposed into

x = y⊥+ z with y ∈ type subspace.

The projection onto the type subspace will re�ect acomponent within the delivery subspace.

The subsequent projection onto the delivery subspace willonly account for the part not yet explained by the previousprojection.

Reversing type and delivery has a corresponding e�ect.

The order of projection matters for the distances (or SS's).

The parameter estimates are the same (same model).

122

Documents

Stat 502 Design and Analysis of Experiments Factorial Design