1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii

Preview:

Citation preview

1

ANALYSIS OF VARIANCE (ANOVA)

Heibatollah Baghi, and Mastee Badii

2

Purpose of ANOVA

• Use one-way Analysis of Variance to test when the mean of a variable (Dependent variable) differs among three or more groups

– For example, compare whether systolic blood pressure differs between a control group and two treatment groups

3

Purpose of ANOVA

• One-way ANOVA compares three or more groups defined by a single factor.

– For example, you might compare control, with drug treatment with drug treatment plus antagonist. Or might compare control with five different treatments.

• Some experiments involve more than one factor. These data need to be analyzed by two-way ANOVA or Factorial ANOVA.

– For example, you might compare the effects of three different drugs administered at two times. There are two factors in that experiment: Drug treatment and time.

4

Why not do repeated t-tests?

• Rather than using one-way ANOVA, you might be tempted to use a series of t tests, comparing two groups each time. Don’t do it.

• Repeated t-test increase the chances of type I error or multiple comparison problem

• If you are making comparison between 5 groups, you will need 10 comparison of means

• When the null hypothesis is true the probability that at least 1 of the 10 observed significance levels is less than 0.05 is about 0.29

5

Why not do repeated t-tests?

• With 10 means (45 comparisons), the probability of finding at least one significant difference is about 0.63

• In other words, when level of significance is .05, there is a 1 in 20 chance that one t-test will yield a significant result even when the null hypothesis is true.

• The more t-test the more that probability will increase

6

What Does ANOVA Do?

• ANOVA involves the partitioning of variance of the dependent variable into different components:

– A. Between Group Variability

– B. Within Group Variability

• More Specifically, The Analysis of Variance is a method for partitioning the Total Sum of Squares into two Additive and independent parts.

7

Definition of Total Sum of Squares or Variance

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

CaseGroup

1Group

2 …Group

p

1 X11 X21 … Xp1

2 X12 X22 … Xp2

3 X13 X23 … Xp3

… … … ..

n X1n X2n .. Xpn

Summed acrossall n times p observations

Grand average

8

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j

Definition of Between Sum of Squares

CaseGroup

1Group

2 …Group

p

1 X11 X21 … Xp1

2 X12 X22 … Xp2

3 X13 X23 … Xp3

… … … ..

n X1n X2n .. Xpn

Average of

group j

Grand average

Sum of squared differences

of group means from the grand

mean is SSB

9

Definition of Within Sum of Squares

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

CaseGroup

1Group

2 …Group

p

1 X11 X21 … Xp1

2 X12 X22 … Xp2

3 X13 X23 … Xp3

… … … ..

n X1n X2n .. Xpn

Sum of squareddifference

of observations

from group means

Observations

Group m

ean

10

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

Partitioning of Variance into Different Components

Total sum of squares

Between

groups

sum of squares

Within

groups

sum of

squares

11

Test Statistic in ANOVA

Test statistic for ANOVA

is based on between &

within groups SS

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

12

Test Statistic in ANOVA

• F = Between group variability / Within group variability– The source of Within group variability is the individual

differences.

– The source of Between group variability is effect of independent or grouping variables.

– Within group variability is sampling error across the cases

– Between group variability is effect of independent groups or variables

13

Steps in Test of Hypothesis

1. Determine the appropriate test

2. Establish the level of significance:α

3. Determine whether to use a one tail or two tail test

4. Calculate the test statistic

5. Determine the degree of freedom

6. Compare computed test statistic against a tabled/critical value

Same as Before

14

1. Determine the Appropriate Test

• Independent random samples have been taken from each population

• Dependent variable population are normally distributed (ANOVA is robust with regards to this assumption)

• Population variances are equal (ANOVA is robust with regards to this assumption)

• Subjects in each group have been independently sampled

15

2. Establish Level of Significance

• α is a predetermined value

• The convention• α = .05

• α = .01

• α = .001

16

3. Use a Two Tailed Test

• Ho: 1 = 2 = 3 = 4

Where1 = population mean for group 12 = population mean for group 23 = population mean for group 34 = population mean for group 4

• H1 = not Ho

17

3. Use a Two Tailed Test

• Ha = not Ho

• The alternative hypothesis does not specify whether

1 2 or

2 3 or

1 3

18

4. Calculating Test Statistics

• F = (SSb / dfB) / (SSw / dfw)S

um o

f sq

uare

bet

wee

n

Deg

rees

of fr

eedo

m

bet

wee

nS

um o

f sq

uare

with

in

Deg

rees

of fr

eedo

m

with

in

19

4. Calculating Test Statistics

• By dividing the sum of the squared deviations by degrees of freedom, we are essentially computing an “average” (or mean) amount of variation

• The specific name for the numerator of the F statistic is the mean square between (the average amount of between-group variation

• The specific name for the denominator of the F statistic is the mean square within (the average amount of within- group variation)

20

5. Determine Degrees of Freedom

• Degrees of freedom between

– dfB = k – 1

– K = number of groups

• Degrees of freedom within

– dfw = N – k

– N = total number of subjects in the study

21

6. Compare the Computed Test Statistic Against a Tabled Value

• α = .05

• If Fc > Fα Reject H0

• If Fc > Fα Can not Reject H0

22

Example

• Suppose we had patients with myocardial infarction in the following groups:– Group 1: A music therapy group

– Group 2: A relaxation therapy group

– Group 3: A control group

• 15 patients are randomly assigned to the 3 groups and then their stress levels are measured to determine if the interventions were effective in minimizing stress.

23

Example

• Dependent Variable

– The stress scores. The ranges are from zero (no stress) to 10 (extreme stress)

• Independent Variable or Factor

– Treatment Conditions(3 levels)

24

Observations

Group 1 Group 2 Group 30 1 56 4 62 3 104 2 83 0 6

Mean 3 2 7

25

Sum of Squares for Each GroupGroup 1

0

Group 2

1

Group 3

5

6 4 6

2 3 10

4 2 8

3 0 6

SS1 = 20 SS2 = 10 SS3= 16

n1=5 n2= 5 n3 = 5

3.0X1 2.0 X2 7.0 X3

26

SS Within

70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween

46 16 10 20 SSWithin

16 7)-(67)-(8 7)-(10 7)-(6 7)- (5

)X ( SS

10 2)-(02)-(2 2)-(3 2)-(4 2)- (1

)X ( SS

20 3)-(33)-(4 3)-(2 3)-(6 3)- (0

)( SS

222

22222

23

22222

22

22222

21

X

X

XX

3j

2j

1j 1

2

3j 3

70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween

46 16 10 20 SSWithin

16 7)-(67)-(8 7)-(10 7)-(6 7)- (5

)X ( SS

10 2)-(02)-(2 2)-(3 2)-(4 2)- (1

)X ( SS

20 3)-(33)-(4 3)-(2 3)-(6 3)- (0

)( SS

222

22222

23

22222

22

22222

21

X

X

XX

3j

2j

1j

70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween

46 16 10 20 SSWithin

16 7)-(67)-(8 7)-(10 7)-(6 7)- (5

)X ( SS

10 2)-(02)-(2 2)-(3 2)-(4 2)- (1

)X ( SS

20 3)-(33)-(4 3)-(2 3)-(6 3)- (0

)( SS

222

22222

23

22222

22

22222

21

X

X

XX

3j

2j

70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween

46 16 10 20 SSWithin

16 7)-(67)-(8 7)-(10 7)-(6 7)- (5

)X ( SS

10 2)-(02)-(2 2)-(3 2)-(4 2)- (1

)X ( SS

20 3)-(33)-(4 3)-(2 3)-(6 3)- (0

)( SS

222

22222

23

22222

22

22222

21

X

X

XX

3j

70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween

46 16 10 20 SSWithin

16 7)-(67)-(8 7)-(10 7)-(6 7)- (5

)X ( SS

10 2)-(02)-(2 2)-(3 2)-(4 2)- (1

)X ( SS

20 3)-(33)-(4 3)-(2 3)-(6 3)- (0

)( SS

222

22222

23

22222

22

22222

21

X

X

XX

3j

2j

1j 1

2

3j 3

27

70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween

46 16 10 20 SSWithin

16 7)-(67)-(8 7)-(10 7)-(6 7)- (5

)X ( SS

10 2)-(02)-(2 2)-(3 2)-(4 2)- (1

)X ( SS

20 3)-(33)-(4 3)-(2 3)-(6 3)- (0

)( SS

222

22222

23

22222

22

22222

21

X

X

XX

3j

2j

1j 1

2

3j 3

70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween

46 16 10 20 SSWithin

16 7)-(67)-(8 7)-(10 7)-(6 7)- (5

)X ( SS

10 2)-(02)-(2 2)-(3 2)-(4 2)- (1

)X ( SS

20 3)-(33)-(4 3)-(2 3)-(6 3)- (0

)( SS

222

22222

23

22222

22

22222

21

X

X

XX

3j

2j

1j

70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween

46 16 10 20 SSWithin

16 7)-(67)-(8 7)-(10 7)-(6 7)- (5

)X ( SS

10 2)-(02)-(2 2)-(3 2)-(4 2)- (1

)X ( SS

20 3)-(33)-(4 3)-(2 3)-(6 3)- (0

)( SS

222

22222

23

22222

22

22222

21

X

X

XX

3j

2j

70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween

46 16 10 20 SSWithin

16 7)-(67)-(8 7)-(10 7)-(6 7)- (5

)X ( SS

10 2)-(02)-(2 2)-(3 2)-(4 2)- (1

)X ( SS

20 3)-(33)-(4 3)-(2 3)-(6 3)- (0

)( SS

222

22222

23

22222

22

22222

21

X

X

XX

3j

70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween

46 16 10 20 SSWithin

16 7)-(67)-(8 7)-(10 7)-(6 7)- (5

)X ( SS

10 2)-(02)-(2 2)-(3 2)-(4 2)- (1

)X ( SS

20 3)-(33)-(4 3)-(2 3)-(6 3)- (0

)( SS

222

22222

23

22222

22

22222

21

X

X

XX

3j

2j

1j 1

2

3j 3

Number

of cases

SS BetweenGroup 2

average

Group 1

average

Group 3

average

Grand

average

28

Sum of Squares Total

116 4)-(64)-(84)-(10

4)-(64)-(54)-(04)-(2

4)-(3 4)-(4 4)-(1 4)-(3

4)- (4 4)-(2 4)-(6 4)- (0 SSTotal

222

2222

2 22 2

2222

29

Components of Variance

SSTotal = SSBetween + SSWithin

116 = 70 + 46

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

),.......,1;,......,1(

)( )(n )( 2.

2..

2..

pjni

XXXXXXn

ii

jij

p

ij

p

ij

ij

n

iiij

p

ij

i=1j=1 i=1j=1j=1

.j .j

30

Degrees of Freedom

• Df between = 3 -1

• Df within = 15 - 3

dfB = k – 1

dfw = N – k

31

Test Statistic

MSBetween= 70 / 2 = 35

MSWithin= 46 / 12 = 3.83

Fc = MSBetween / MSWithin

Fc = 35 / 3.83 = 9.13

32

Lookup Critical Value

• Fα = 3.88

33

Conclusions

• Fc = 9.13 > Fα = 3.88

• Fc > Fα Therefore Reject H0

34

One-way ANOVA Summary

Source SS DF MS Fc Fα

-------------- ------ ------ -------- ------ ------

Between 70 2 35 9.13 3.88

Within 46 12 3.83

------- ------ ---- ----- ----- -------

Total 116 14

35

Multiple Comparison GroupsF test does not tell which pair are not equal

Additional analysis is necessary to answer which pair are not equal

36

Fisher’s LSD Test

• These are the null and alternative hypothesis being tested

– Ho1 : µ1 = µ2 Ha1 : µ1 µ2

– Ho2 : µ1 = µ3 Ha2 : µ1 µ3

– Ho3 : µ2 = µ3 Ha3 : µ2 µ3

37

Fisher’s LSD Test

• Known as the protected t-test

• The least difference between means needed for significance

• Df = N – K

• Use the following formula:

)/2(05. nMSwtLSD

38

Calculation of LSD

• All pairs for means differing by at least 2.70 points on the stress scale would be significantly different from on another.

70.2)40(.83.318.2 LSD

39

Application to Three Samples

Mean 1 – Mean 2 = 1

Mean 3 – Mean 1 = 4

Mean 3 – Mean 2 = 5

Alternative Hypotheses:

Ho1 :µ1 = µ2 Not Rejected

Ho2 :µ1 = µ3 Rejected

Ho3 :µ2 = µ3 Rejected

40

Use of SPSS in ANOVA

41

Data in SPSS Input Format

Stress Score Groups

0 1

6 1

2 1

4 1

3 1

1 2

4 2

3 2

2 2

0 2

5 3

6 3

10 3

8 3

6 3

42

SPSS Output for ANOVA

Descriptives 

Stress Levels

Music Therapy5 3.00 2.236 1.000 .22 5.78 0 6

Relaxation Therapy 5 2.00 1.581 .707 .04 3.96 0 4

Control Group5 7.00 2.000 .894 4.52 9.48 5 10

  N Mean Std. Deviation Std. Error95% Confidence Interval

for Mean Minimum Maximum

  Lower Bound

Upper Bound

Total15 4.00 2.878 .743 2.41 5.59 0 10

43

SPSS Output for ANOVA Test of Homogeneity of Variances 

Stress Levels.

Levene Statistic df1 df2

Sig level or p-value

.242 2 12 .788

 Stress Levels

Between Groups70.000 2 35.000 9.130 .004

Within Groups46.000 12 3.833

Sum of

Squares dfMean

Square F

Sig.level or p-value

Total116.000 14

P<.05, therefore, we reject the Null Hypothesis and continue with Multiple Comparison Table 

P > .05, therefore, th assumption of Homogeneity of Variance is met.

ANOVA

44

SPSS Output for ANOVA Multiple Comparisons

 

Dependent Variable: Stress Levels LSD

Music Therapy Relaxation Therapy 1.000 1.238 .435 -1.70 3.70

Control Group-4.000(*) 1.238 .007 -6.70 -1.30

Relaxation Therapy

Music Therapy-1.000 1.238 .435 -3.70 1.70

Control Group-5.000(*) 1.238 .002 -7.70 -2.30

Control Group Music Therapy4.000(*) 1.238 .007 1.30 6.70

Relaxation Therapy 5.000(*) 1.238 .002 2.30 7.70

(I) Groups (J) Groups

Mean Difference

(I-J) Std. ErrorSig.

Level 95% Confidence Interval

* The mean difference is significant at the .05 level.

45

Take home lesson

How to compare means of three or more samples

Recommended