One-Way Analysis of Variance - School of Statistics : …users.stat.umn.edu/~helwig/notes/aov1-Notes.pdf · · 2017-01-06Categorical Predictors Categorical Predictors Nathaniel

One-Way Analysis of Variance

Nathaniel E. Helwig

Assistant Professor of Psychology and StatisticsUniversity of Minnesota (Twin Cities)

Updated 04-Jan-2017

Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 1

Copyright

Copyright © 2017 by Nathaniel E. Helwig


Outline of Notes

1) Categorical Predictors:OverviewDummy codingEffect coding

2) One-Way ANOVA:Model form & assumptionsEstimation & basic inferenceMemory example (part 1)

3) Multiple Comparisons:Comparing meansBonferroni correctionTukey correctionScheffé correctionSummary of correctionsMemory example (part 2)


Categorical Predictors

Categorical Predictors


Categorical Predictors Overview

Categorical Variables (revisited)

Suppose that X ∈ {x1, . . . , xg} is a categorical variable with g levels.Categorical variables are also called factors in ANOVA contextExample: sex ∈ {female,male} has two levelsExample: drug ∈ {A,B,C} has three levels

To code a categorical variable (with g levels) in a regression model, weneed to include g − 1 different variables in the model.

If we know µ = 1g∑g

j=1 µj , where µj is mean for j-th factor level

Then we know µg = g(µ− 1g∑g−1

j=1 µj) by definition

Only g total free parameters, so cannot estimate µ and {µj}gj=1


Categorical Predictors Dummy Coding

Dummy Coding: Definition

Dummy coding uses g − 1 binary variables to code a factor:

xij =

{1 if i-th observation is in j-th level0 otherwise

for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,g − 1}.

Regression model becomes

yij = b0 +∑g−1

j=1 bjxij + eij

where b0 = µg and bj = µj − µg for j ∈ {1, . . . ,g − 1}.



Dummy Coding: Considerations

Dummy coding is useful for. . .One-way ANOVA model (unique parameter for each factor level)Comparing treatment groups to clearly defined “control” group

Dummy coding is less useful when. . .Have g > 2 levels and/or model is more complicatedDo NOT have a clearly defined “control” or “reference” group



Dummy Coding: R Syntax

The contrasts function controls the coding scheme for a factor.

Use the contr.treatment option for dummy coding.

> x = factor(rep(letters[1:3], each=5))> x[1] a a a a a b b b b b c c c c c

Levels: a b c> contrasts(x) <- contr.treatment(nlevels(x))> contrasts(x)2 3

a 0 0b 1 0c 0 1


Categorical Predictors Effect Coding

Effect Coding: Definition

Effect coding also uses g − 1 variables to code a factor:

xij =

1 if i-th observation is in j-th level−1 if i-th observation is in g-th level

0 otherwise

for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,g − 1}.

Regression model becomes

yij = b0 +∑g−1

j=1 bjxij + eij

where b0 = µ and bj = µj − µ for j ∈ {1, . . . ,g}; note bg = −∑g−1

j=1 bj .



Effect Coding: Considerations

Effect coding is useful for. . .Simple interpretation of b0 as overall meanComparing each group’s effect to overall mean

Effect coding is less useful when. . .Have g = 2 levels for a factorDo have a clearly defined “control” or “reference” group



Effect Coding: R Syntax

The contrasts function controls the coding scheme for a factor.

Use the contr.sum option for effect (deviation) coding.

> x = factor(rep(letters[1:3],each=5))> x[1] a a a a a b b b b b c c c c c

Levels: a b c> contrasts(x) <- contr.sum(nlevels(x))> contrasts(x)[,1] [,2]

a 1 0b 0 1c -1 -1


One-Way ANOVA Model

One-Way ANOVA Model


One-Way ANOVA Model Model Form and Assumptions

One-Way ANOVA Model (cell means form)

The One-Way Analysis of Variance (ANOVA) model has the form

yij = µj + eij

for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,g} whereyij ∈ R is real-valued response for i-th subject in j-th factor levelµj ∈ R is real-valued population mean for the j-th factor level

eijiid∼ N(0, σ2) is a Gaussian error term

nj is number of subjects in j-th factor level and n =∑g

j=1 nj

g is number of factor levels

Implies that yijind∼ N(µj , σ

2).



One-Way ANOVA Model (dummy coding)

Using dummy coding, the one-way ANOVA becomes

yij = b0 +

g−1∑j=1

bjxij + eij

for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,g} where

xij =

{1 if i-th observation is in j-th level0 otherwise

b0 = µg is reference group meanbj = µj − µg for j ∈ {1, . . . ,g − 1}



One-Way ANOVA Model (effect coding)

Using effect coding, the one-way ANOVA becomes

yij = b0 +

g−1∑j=1

bjxij + eij

for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,g} where

xij =

1 if i-th observation is in j-th level−1 if i-th observation is in g-th level

0 otherwiseb0 = µ is overall meanbj = µj − µ for j ∈ {1, . . . ,g}

Note that bg = −∑g−1

j=1 bj by definition



One-Way ANOVA Model (matrix form)

In matrix form, the one-way ANOVA model is

y = Xb + ey1y2y3...

yn

=

1 x11 x12 · · · x1(g−1)1 x21 x22 · · · x2(g−1)1 x31 x32 · · · x3(g−1)...

......

. . ....

1 xn1 xn2 · · · xn(g−1)

b0b1b2...

bg−1

+

e1e2e3...

en

where

definition of xij and {bj}g−1j=1 will depend on coding scheme

i ∈ {1, . . . ,n} and second subscript on y and e is dropped

Implies that y ∼ N(Xb, σ2In).



One-Way ANOVA Model (assumptions)

The fundamental assumptions of the one-way ANOVA model are:1 xij and yi are observed random variables (known constants)

2 eiiid∼ N(0, σ2) is an unobserved random variable

3 b0,b1, . . . ,bg−1 are unknown constants

4 (yi |xi1, . . . , xi(g−1))ind∼ N(b0 +

∑g−1j=1 bjxij , σ

2)note: homogeneity of variance

Interpretation of bj depends on coding schemeDummy: bj is difference between j-th mean and reference meanEffect: bj is difference between j-th mean and overall mean


One-Way ANOVA Model Estimation and Basic Inference

Ordinary Least Squares (cell means form)

We want to find the factor level mean estimates (i.e., µj terms) thatminimize the ordinary least squares criterion

SSE =

g∑j=1

nj∑i=1

(yij − µj)2

The least-squares estimates are the factor level means

µj = yj

where yj = 1nj

∑nji=1 yij is sample mean of Y for j-th factor level.



Ordinary Least Squares (simple proof)

Note that we want to minimize

(SSE)j =

nj∑i=1

(yij − µj)2 =

nj∑i=1

y2ij − 2µj

nj∑i=1

yij + njµ2j

separately for each j ∈ {1, . . . ,g}.

Taking the derivative with respect to µj we have

d(SSE)j

dµj= −2

nj∑i=1

yij + 2njµj

and setting to zero and solving for µj gives µj = 1nj

∑nji=1 yij = yj



Ordinary Least Squares (general case)

In general, we can use the regression approach

SSE =n∑

i=1

(yi − b0 −

∑g−1j=1 bjxij

)2= ‖y− Xb‖2

where i ∈ {1, . . . ,n} and n =∑g

j=1 nj ; note that the second subscripton Y is now dropped because there is only one summation.

The OLS solution has the form

b = (X′X)−1X′y

which is the same as the MLR model; note that ANOVA is MLR withcategorical predictors!



Fitted Values and Residuals

SCALAR FORM:

Fitted values are given by

yi = b0 +∑g−1

j=1 bjxij

and residuals are given by

ei = yi − yi

MATRIX FORM:

Fitted values are given by

y = Xb

and residuals are given by

e = y− y



ANOVA Sums-of-Squares: Scalar Form

In one-way ANOVA model, the relevant sums-of-squares areTotal: SST =

∑ni=1(yi − y)2 =

∑gj=1∑nj

i=1(yij − y)2

Between: SSB =∑n

i=1(yi − y)2 =∑g

j=1 nj(yj − y)2

Within: SSW =∑n

i=1(yi − yi)2 =

∑gj=1∑nj

i=1(yij − yj)2

The corresponding degrees of freedom areSST: dfT = n − 1SSB: dfB = g − 1SSW: dfW = n − g



ANOVA Sums-of-Squares: Matrix Form

In MLR models, the relevant sums-of-squares are

SST =n∑

i=1

(yi − y)2

= y′ [In − (1/n)J] y

SSB =n∑

i=1

(yi − y)2

= y′ [H− (1/n)J] y

SSW =n∑

i=1

(yi − yi)2

= y′ [In − H] y

Note: H = X(X′X)−1X′ and J is an n × n matrix of onesNathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 23


Partitioning the Variance (same as MLR model)

We can partition the total variation in yi as

SST =n∑

i=1

(yi − y)2

=n∑

i=1

(yi − yi + yi − y)2

=n∑

i=1

(yi − y)2 +n∑

i=1

(yi − yi )2 + 2

n∑i=1

(yi − y)(yi − yi )

= SSB + SSW + 2n∑

i=1

(yi − y)ei

= SSB + SSW

See MLR notes for the proof.



Estimated Error Variance (Mean Squared Error)

An unbiased estimate of the error variance σ2 is

σ2 = SSW/(n − g)

=n∑

i=1

(yi − yi)2/(n − g)

=1

n − g

g∑j=1

(nj − 1)s2j

where s2j = 1

nj−1∑nj

i=1(yij − yj)2 is j-th factor level’s sample variance.

The estimate σ2 is the mean squared error (MSE) of the model, and isthe pooled estimate of sample variance.



ANOVA Table and Overall F Test

We typically organize the SS information into an ANOVA table:

Source SS df MS F p-valueSSB

∑ni=1(yi − y)2 g − 1 MSB F ∗ p∗

SSW∑n

i=1(yi − yi)2 n − g MSW

SST∑n

i=1(yi − y)2 n − 1MSB = SSB

g−1 , MSW = SSWn−g , F ∗ = MSB

MSW ∼ Fg−1,n−g ,p∗ = P(Fg−1,n−g > F ∗)

F ∗-statistic and p∗-value are testing H0 : b1 = · · · = bg−1 = 0 versusH1 : bk 6= 0 for some k ∈ {1, . . . ,g − 1}

Equivalent to testing H0 : µj = µ ∀j versus H1 : not all µj are equal


One-Way ANOVA Model Memory Example (Part 1)

Memory Example: Data Description

Visual and auditory cues example from Hays (1994) Statistics.

Does lack of visual/auditory synchrony affect memory?

Total of n = 30 college students participate in memory experiment.Watch video of person reciting 50 wordsTry to remember the 50 words (record number correct)

Randomly assign nj = 10 subjects to one of g = 3 video conditions:fast: sound precedes lip movements in videonormal: sound synced with lip movements in videoslow: lip movements in video precede sound



Memory Example: Descriptive Statistics

Number of correctly remembered words (yij ):

Subject (i) Fast (j = 1) Normal (j = 2) Slow (j = 3)1 23 27 232 22 28 243 18 33 214 15 19 255 29 25 196 30 29 247 23 36 228 16 30 179 19 26 20

10 17 21 23∑10i=1 yij 212 274 218∑10i=1 y2

ij 4738 7742 4810



Memory Example: OLS Estimation (by hand)

The least-squares estimates of µj are the sample means:

µ1 = y1 =1

10

10∑i=1

yi1 = 212/10 = 21.2

µ2 = y2 =1

10

10∑i=1

yi2 = 274/10 = 27.4

µ3 = y3 =1

10

10∑i=1

yi3 = 218/10 = 21.8



Memory Example: OLS Estimation (in R: by hand)

# define response and factor vectors> sync = c(23,27,23,22,28,24,18,33,21,15,

19,25,29,25,19,30,29,24,23,36,22,16,30,17,19,26,20,17,21,23)

> cond = factor(rep(c("fast","normal","slow"), 10))

# sum of sync for each level of cond> tapply(sync, cond, sum)fast normal slow212 274 218

# sum-of-squares of sync for each level of cond> sumsq = function(x){ sum(x^2) }> tapply(sync, cond, sumsq)fast normal slow4738 7742 4810



Memory Example: OLS Estimation (in R: dummy pt. 1)

> smod = lm(sync ~ cond)> summary(smod)$coef

Estimate Std. Error t value Pr(>|t|)(Intercept) 21.2 1.408440 15.0521126 1.183308e-14condnormal 6.2 1.991835 3.1127073 4.350803e-03condslow 0.6 1.991835 0.3012297 7.655470e-01

Note that. . .b0 = y1 = 21.2 is mean of fast conditionb1 = y2 − y1 = 27.4− 21.2 = 6.2 is difference between means ofnormal and fast conditionsb2 = y3 − y1 = 21.8− 21.2 = 0.6 is difference between means ofslow and fast conditions



Memory Example: OLS Estimation (in R: dummy pt. 2)> contrasts(cond)

normal slowfast 0 0normal 1 0slow 0 1> contrasts(cond) <- contr.treatment(3, base=2)> contrasts(cond)

1 3fast 1 0normal 0 0slow 0 1> smod = lm(sync ~ cond)> summary(smod)$coef

Estimate Std. Error t value Pr(>|t|)(Intercept) 27.4 1.408440 19.454146 2.052094e-17cond1 -6.2 1.991835 -3.112707 4.350803e-03cond3 -5.6 1.991835 -2.811478 9.072381e-03

Note that. . .b0 = y2 = 27.4 is mean of normal conditionb1 = y1 − y2 = 21.2− 27.4 = −6.2 is difference between meansof fast and normal conditionsb2 = y3 − y2 = 21.8− 27.4 = −5.6 is difference between meansof slow and normal conditions



Memory Example: OLS Estimation (in R: effect)> contrasts(cond) <- contr.sum(3)> contrasts(cond)

[,1] [,2]fast 1 0normal 0 1slow -1 -1> smod = lm(sync ~ cond)> summary(smod)$coef

Estimate Std. Error t value Pr(>|t|)(Intercept) 23.466667 0.8131633 28.858492 7.900265e-22cond1 -2.266667 1.1499866 -1.971037 5.904989e-02cond2 3.933333 1.1499866 3.420330 2.003601e-03

Note that. . .b0 = y = 23.47 is grand mean (y = 212+274+218

30 = 70430 = 23.467)

b1 = y1 − y = 21.2− 23.467 = −2.266667 is difference betweenmean of fast condition and overall meanb2 = y2 − y = 27.4− 23.467 = 3.933333 is difference betweenmean of normal condition and overall meanImplicitly we have: b3 = −(b1 + b2) = −(3.933333− 2.266667) =y3 − y = 21.8− 23.467 = −1.67 is difference between mean ofslow condition and overall mean



Memory Example: Sums-of-Squares (by hand)

Defining n =∑g

j=1 nj = 30, the relevant sums-of-squares are

SST =∑g

j=1∑nj

i=1(yij − y)2 =∑g

j=1∑nj

i=1 y2ij −

1n

(∑gj=1∑nj

i=1 yij

)2

= (4738 + 7742 + 4810)− (7042/30) = 769.4667

SSW =∑g

j=1∑nj

i=1(yij − yj)2 =

∑gj=1∑nj

i=1 y2ij −

∑gj=1

(∑nji=1 yij

)2

nj

= (4738 + 7742 + 4810)− ([2122 + 2742 + 2182]/10) = 535.6

SSB = SST − SSW = 769.4667− 535.6 = 233.8667



Memory Example: ANOVA Table (by hand)

Putting sums-of-squares on previous slide into an ANOVA table:

Source SS df MS F p-valueSSB 233.8667 2 116.9333 5.8947 0.0075SSW 535.6000 27 19.8370SST 769.4667 29

Note that F ∗ = 5.8947 ∼ F2,27 and P(F2,27 > F ∗) = 0.0075.

Assuming a typical α level (e.g., α = 0.01 or α = 0.05), we wouldreject the null hypothesis H0 : µj = µ ∀j .

We conclude that there is some mean difference on the responsevariable (# of remembered words) between the different conditions.



Memory Example: ANOVA Table (in R)

> sync = c(23,27,23,22,28,24,18,33,21,15,19,25,29,25,19,30,29,24,23,36,22,16,30,17,19,26,20,17,21,23)

> cond = factor(rep(c("fast","normal","slow"), 10))> smod = lm(sync ~ cond)> anova(smod)Analysis of Variance Table

Response: syncDf Sum Sq Mean Sq F value Pr(>F)

cond 2 233.87 116.933 5.8947 0.007513 **Residuals 27 535.60 19.837---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Multiple Comparisons

Multiple Comparisons


Multiple Comparisons Comparing Means

Limitations of Overall F Test

If we reject H0 : µj = µ ∀j then we know that there are some meandifferences, but we do not know where the mean differences exist

This is assuming that g > 2Note that if g = 2 we have an independent samples t test

If g > 2 we need to perform followup tests (multiple comparisons) todetermine where the mean differences are occurring in the data.



Linear Combinations of Factor Level Means

A linear combination L of the factor level means has the form

L =∑g

j=1 cjµj

where cj are the coefficients defining the particular linear combination.

In practice we never know µj so we define

L =∑g

j=1 cj µj

where µj is our least-squares estimate of µj .



Testing Linear Combinations

Remember µj = yj which implies

V (µj) = V(

1nj

∑nji=1 yij

)=

1n2

jV(∑nj

i=1 yij

)=

∑nji=1 V (yij)

n2j

=njσ

2

n2j

=σ2

nj

is the variance of µj , given that V(yij)

= σ2 from the assumptions.

This implies that

V (L) = V(∑g

j=1 cj µj

)=∑g

j=1 c2j V (µj) = σ2∑g

j=1 c2j /nj

is the variance of any linear combination L.



Testing Linear Combinations (continued)

To test H0 : L = L∗ versus H1 : L 6= L∗ use

t∗ =L− L∗√

V (L)∼ tn−g

where V (L) = σ2∑gj=1 c2

j /nj uses the MSE to estimate σ2.

If population σ2 is known, then use

Z ∗ =L− L∗√

V (L)∼ N(0,1)



Contrasts and Pairwise Comparisons

A contrast is a linear combination of the factor level means such thatthe coefficients sum to zero, i.e.,

∑gj=1 cj = 0.

µ1 − µ2 is mean difference of first two levels(µ1 + µ2)/2− µ3 is mean of first two levels minus third level

A pairwise comparison is a contrast involving two factor level means:µ1 − µ2 is a pairwise comparison of first two levelsµ1 − µ3 is a pairwise comparison of first and third levelsµ2 − µ3 is a pairwise comparison of second and third levels



Multiple Comparison Problem

If we test multiple linear combinations of factor level means, we needto worry about the Familywise Type I Error Rate (FWER).

FWER is probability of making at least one Type I Error among alltested linear combinations.

Single test Type I Error = P(Reject H0 | H0 true) = α

For q independent tests with level α: FWER = 1− (1− α)q

Generally FWER will depend on number of tests and whether or nottests are independent of one another.


Multiple Comparisons Bonferroni Correction

Bonferroni’s Correction: Definition

Suppose we want to test f linear combinations of factor level means.

According to Boole’s inequality, for f tests with level α∗

FWER ≤∑f

k=1 P(Reject H0k | H0k true) =∑f

k=1 α∗ = fα∗

regardless of whether or not the tests are independent of one another.

Bonferroni’s correction sets α∗ = α/f to ensure that FWER ≤ α.


Multiple Comparisons Bonferroni Correction

Bonferroni’s Correction: Properties

Major strength: applicable to many situations (no assumptions)

Major weakness: overly conservative in some cases

Suppose we have f = 3 independent tests and want FWER ≤ 0.05FWER = 1− (1− α∗)3

Bonferroni: α∗ = 0.05/3 = 0.0167FWER = 1− (1− 0.0167)3 = 0.04917

Suppose we have f = 10 independent tests and want FWER ≤ 0.05FWER = 1− (1− α∗)10

Bonferroni: α∗ = 0.05/10 = 0.005FWER = 1− (1− 0.005)10 = 0.0489


Multiple Comparisons Tukey Correction

Detour: Studentized Range Distribution

Assume the following. . .

zkiid∼ N(µ, σ2) for k ∈ {1, . . . , r}

σ2 is an estimate of σ2 based on ν degrees-of-freedomσ2 is independent of zk for k ∈ {1, . . . , r}

The studentized range statistic is defined as

qr ,ν =range(zk )

σ

where range(zk ) = max(zk )−min(zk ), and (r , ν) are the numeratorand denominator degrees of freedom.

One-sided probability distribution (similar to F ), so only reject ifobserved q > q(α)

r ,ν where P(qr ,ν > q(α)r ,ν ) = α.



Testing All Possible Pairwise Comparisons

Want to test all possible pairwise comparisons between means.H0 : µj − µk = 0 ∀j , k versus H1 : not all µj − µk = 0There are g(g − 1)/2 unique pairwise comparisons

Note that the variance of a pairwise comparison L = µj − µk is

V (L) = σ2(

1nj

+1nk

)where nj and nk are the sample sizes of factor levels j and k .

If nj = n∗∀j , simplifies to V (L) = σ2( 2n∗

) for any pairwise comparison.



Tukey’s Honest Significant Difference (HSD) Test

Proposed by John Tukey (1953) for balanced ANOVA, i.e., nj = n∗∀j .

Test statistic is defined as

q∗ =

√2L√

V (L)=

yj − yk√σ2/n∗

where L = yj − yk , V (L) = σ2( 2n∗

), and σ2 is MSE of model.

Considering all pairwise comparisons, q∗ ∼ qg,n−g where qg,n−g isstudentized range distribution with (g,n − g) degrees of freedom.

Note that yj ∼ N(µj , σ2/n∗) for all j ∈ {1, . . . ,g}

Under H0 : µj − µk = 0 ∀j , k , we have yj ∼ N(µ, σ2/n∗) ∀j



Tukey’s HSD Test (continued)

To form a 100(1− α)% CI around the mean difference yj − yk use

(yj − yk )±q(α)

g,n−g√2

√V (L)

where q(α)g,n−g is critical value from studentized range distribution.

If you form all possible CIs around pairwise mean differences, you willcontrol FWER exactly at level α using Tukey’s HSD test.

More conservative than forming all CIs using tn−g critical values

Example 95% CI: q(0.95)3,27 /

√2 = 2.48 and t(0.975)

27 = 2.05



Tukey-Kramer Test

In unbalanced ANOVA, i.e., nj 6= nk for at least one (j , k), use HSDextension proposed by John Tukey (1953) and Clyde Kramer (1956)

Called the Tukey-Kramer test or Tukey-Kramer procedure

To form a 100(1− α)% CI around the mean difference yj − yk use

(yj − yk )±q(α)

g,n−g√2

√V (L)

where V (L) = σ2( 1nj

+ 1nk

) is estimated variance of L = µj − µk .

If you form all possible CIs around pairwise mean differences, you willcontrol FWER below (but not exactly at) level α using Tukey-Kramer.

See Hayter (1984) for formal proof of TK conservativenessNathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 50

Multiple Comparisons Scheffé Correction

Testing All Possible Contrasts

Want to test all possible contrasts between factor level meansH0 :

∑gj=1 cjµj = 0 for all c ∈ C where

C = {c = (c1, . . . , cg) :∑g

j=1 cj = 0} is set of all contrasts

H1 :∑g

j=1 cjµj 6= 0 for some c ∈ C

Note that µj = µ ∀j ⇐⇒∑g

j=1 cjµj = 0 for all c ∈ C

Proof of µj = µ ∀j =⇒∑g

j=1 cjµj = 0 for all c ∈ CIf µj = µ ∀j , then

∑gj=1 cjµj = µ

∑gj=1 cj = µ(0) = 0 for all c ∈ C

Proof of∑g

j=1 cjµj = 0 for all c ∈ C =⇒ µj = µ ∀jIf∑g

j=1 cjµj = 0 for all c ∈ C, then µj − µk = 0 for all j , k



Contrasts and Overall ANOVA F Test

Remember the overall F test (associated with ANOVA table) is testingH0 : µj = µ ∀j versus H1 : not all µj are equal

Equivalent to testing H0 :∑g

j=1 cjµj = 0 for all c ∈ C versusH1 :

∑gj=1 cjµj 6= 0 for some c ∈ C

See previous slide for proof of equivalence

Point: if we want to test all possible contrasts, we can use Fg−1,n−gdistribution to control FWER at level α.



Scheffé’s Method

Want to test all possible contrasts between factor level meansH0 :

∑gj=1 cjµj = 0 for all c ∈ C where

C = {c = (c1, . . . , cg) :∑g

j=1 cj = 0} is set of all contrasts

H1 :∑g

j=1 cjµj 6= 0 for some c ∈ C

To form a 100(1− α)% CI for a contrast L =∑g

j=1 cjµj use

L±√

(g − 1)F (α)g−1,n−g

√V (L)

where V (L) = σ2∑gj=1 c2

j /nj is estimated variance of L =∑g

j=1 cj yj .

If you form CIs around all possible contrasts, you will control FWERexactly at level α using Scheffé’s method. Proof


Multiple Comparisons Summary of Corrections

Summary of Multiple Comparisons

If we want to form 100(1− α)% CI around all L = µj − µk use

L± C√

V (L)

where V (L) = σ2( 1nj

+ 1nk

) and C is some critical value.

Each procedure uses different critical value:

No correction: C = t(α/2)n−g

Bonferroni: C = t(α∗/2)

n−g with α∗ = α/[g(g − 1)/2]

Tukey: C = q(α)g,n−g/

√2

Scheffé: C =√

(g − 1)F (α)g−1,n−g

Scheffé’s critical value is ALWAYS larger than Tukey value, becauseset of all pairwise comparisons is subset of set of all contrasts.Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 54

Multiple Comparisons Summary of Corrections

Choosing Between Multiple Comparisons

You want CIs that are as narrow as possible and control FWER.

If you are interested in all pairwise comparisons, use Tukey-Kramer.

If you are interested in all possible contrasts, use Scheffé.

If you are interested in some subset of all pairwise comparisons (or allcontrasts), Bonferroni may be most efficient approach.


Multiple Comparisons Memory Example (Part 2)

Memory Example: Pairwise Comparison Estimates

Suppose we want to test all g(g − 1)/2 = 3 unique pairwisecomparisons between factor level means:

L1 = µ1 − µ2 = µfast − µnormalL2 = µ2 − µ3 = µnormal − µslowL3 = µ3 − µ1 = µslow − µfast

The estimated pairwise comparisons are given byL1 = µ1 − µ2 = 21.2− 27.4 = −6.2L2 = µ2 − µ3 = 27.4− 21.8 = 5.6L3 = µ3 − µ1 = 21.8− 21.2 = 0.6

and we know that V (L) = σ2(2/n∗) = (19.8370)(2/10) = 3.9674



Memory Example: Pairwise Comparison CI Values

If we want to form 95% CI around all three L = µj − µk use

L± C√

V (L) = L± C√

3.9674

whereC = t(.025)

27 = 2.0518 with no correction

C = t(.008)27 = 2.5525 with Bonferroni correction

C =q(.05)

3,27√2

= 2.4794 with Tukey correction

C =√

2F (.05)2,27 = 2.5900 with Scheffé correction

Note that Tukey is best (i.e., produces narrowest CIs), followed byBonferroni, and then Scheffé.



Memory Example: Pairwise CIs (no correction)

Using no correction the CI estimates are:

L1 ± t(.025)27

√V (L1) = −6.2± 2.0518

√3.9674 = [−10.2869; −2.1131]

L2 ± t(.025)27

√V (L2) = 5.6± 2.0518

√3.9674 = [1.5131; 9.6869]

L3 ± t(.025)27

√V (L3) = 0.6± 2.0518

√3.9674 = [−3.4869; 4.6869]



Memory Example: Pairwise CIs (Bonferroni)

Using Bonferroni correction the CI estimates are:

L1 ± t(.008)27

√V (L1) = −6.2± 2.5525

√3.9674 = [−11.2841; −1.1159]

L2 ± t(.008)27

√V (L2) = 5.6± 2.5525

√3.9674 = [0.5159; 10.6841]

L3 ± t(.008)27

√V (L3) = 0.6± 2.5525

√3.9674 = [−4.4841; 5.6841]



Memory Example: Pairwise CIs (Tukey)

Using Tukey correction the CI estimates are:

L1 ±q(.05)

3,27√2

√V (L1) = −6.2± 2.4794

√3.9674 = [−11.1386; −1.2614]

L2 ±q(.05)

3,27√2

√V (L2) = 5.6± 2.4794

√3.9674 = [0.6614; 10.5386]

L3 ±q(.05)

3,27√2

√V (L3) = 0.6± 2.4794

√3.9674 = [−4.3386; 5.5386]



Memory Example: Pairwise CIs (Scheffé)

Using Scheffé correction the CI estimates are:

L1 ±√

2F (.05)2,27

√V (L1) = −6.2± 2.59

√3.9674 = [−11.3589; −1.0411]

L2 ±√

2F (.05)2,27

√V (L2) = 5.6± 2.59

√3.9674 = [0.4411; 10.7589]

L3 ±√

2F (.05)2,27

√V (L3) = 0.6± 2.59

√3.9674 = [−4.5589; 5.7589]



Memory Example: Good use of Scheffé

Suppose we want to test four contrasts:L1 = µ1 − µ2 = µfast − µnormalL2 = µ2 − µ3 = µnormal − µslowL3 = µ3 − µ1 = µslow − µfastL4 = µ2 − µ1+µ3

2 = µnormal − µslow+µfast2

If we want to form 95% CI around all four Lj use

Lj ± C√

V (Lj)

whereC = t(.006)

27 = 2.6763 using Bonferroni (α∗ = .05/4 = .0125)

C =√

2F (.05)2,27 = 2.59 using Scheffé



Memory Example: Good use of Scheffé (continued)

Note that L4 = 27.4− 21.8+21.22 = 5.9 and

V (L4) = σ2∑3j=1

c2j

nj= (19.8370)

(110 + (−1/2)2

10 + (−1/2)2

10

)= 2.9756

Using Scheffé correction the CI estimates are:

L1 ±√

2F (.05)2,27

√V (L1) = −6.2± 2.59

√3.9674 = [−11.3589; −1.0411]

L2 ±√

2F (.05)2,27

√V (L2) = 5.6± 2.59

√3.9674 = [0.4411; 10.7589]

L3 ±√

2F (.05)2,27

√V (L3) = 0.6± 2.59

√3.9674 = [−4.5589; 5.7589]

L4 ±√

2F (.05)2,27

√V (L4) = 5.9± 2.59

√2.9756 = [1.4322; 10.3678]


Appendix

Appendix


Appendix

Scheffé’s Method: Logic

Remember that the overall ANOVA F test has the form

F ∗ =MSBMSW

=

1g−1

∑gj=1 nj(yj − y)2

σ2 ∼ Fg−1,n−g under H0

which implies that

S2 =SSBMSW

=

∑gj=1 nj(yj − y)2

σ2 ∼ (g − 1)Fg−1,n−g under H0

Defining the test of a single contrast as Tc = L√V (L)

, note that

supc∈C

T 2c = S2

where sup denotes the supremum (i.e., least upper-bound).Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 65

Appendix

Scheffé’s Method: Proof (part 1)

To prove the claim supc∈C T 2c = S2, define the n × 1 vector

a′ =(

c1n1

1′n1c2n2

1′n2· · · cg

ng1′ng

)where cj are contrast coefficients and 1nj is an nj × 1 vector of ones.

Define y′ = (y′1, . . . ,y′g) where y′j = (y1j , . . . , ynj j) and note that

a′y =∑g

j=1cjnj

1′njyj =

∑gj=1 cj yj = L

‖a‖2 = a′a =∑g

j=1

(cjnj

)21′nj

1nj =∑g

j=1 c2j /nj

which implies that

T 2c =

(a′y)2

σ2‖a‖2

for any contrast c ∈ C.Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 66

Appendix


Now note that a = Xb where

X =

1n1 0n1 · · · 0n1

0n2 1n2 · · · 0n2...

.... . .

...0ng 0ng · · · 1ng

b =

c1/n1c2/n2

...cg/ng

which implies that a = Ha where H = X(X′X)−1X′ is hat matrix.

Also note that a′1n =∑g

j=1cjnj

1′nj1nj =

∑gj=1 cj = 0, which implies that

a = (H− H0)a

where H0 = 1n 1n1′n is projection matrix for constant space.


Appendix


Putting things together, we can write the single contrast test as

T 2c =

(a′y)2

σ2‖a‖2=

[a′(H− H0)y]2

σ2‖a‖2

for any contrast c ∈ C, given that a′ = a′(H− H0).

By the Cauchy-Schwarz inequality, we know that

(u′v)2 ≤ ‖u‖2‖v‖2

with equality holding only when u = wv for some w 6= 0.


Appendix


Letting a = u and v = (H− H0)y, we have

T 2c =

[a′(H− H0)y]2

σ2‖a‖2

≤ ‖a‖2‖(H− H0)y‖2

σ2‖a‖2=‖(H− H0)y‖2

σ2 = S2

If we define a = (H− H0)y, then T 2c reaches its upper bound of S2.

To control FWER at level α note that

P(T 2c ≤ S2 ∀c ∈ C) = P(sup

c∈CT 2

c ≤ S2)

and we know that S2 ∼ (g − 1)Fg−1,n−g under H0.

Use S =√

(g − 1)F (α)g−1,n−g to form 100(1− α)% CI for Tc

Return


Documents

One-Way Analysis of Variance - School of Statistics : …users.stat.umn.edu/~helwig/notes/aov1-Notes.pdf · · 2017-01-06Categorical Predictors Categorical Predictors Nathaniel