83

Click here to load reader

ModuleII Anova Outline

Embed Size (px)

DESCRIPTION

anova analysis

Citation preview

Page 1: ModuleII Anova Outline

Module #2: Analysis of Variance

1

Table of Contents Overview .................................................................................................................................... 2

Learning Outcomes .................................................................................................................... 3

Required Resources .................................................................................................................. 4

Key Terms and Concepts ........................................................................................................... 5

Learning Material ....................................................................................................................... 7

Learning Activities .....................................................................................................................82

References ...............................................................................................................................83

Page 2: ModuleII Anova Outline

Module #2: Analysis of Variance

2

Overview

Quite often we wish to draw conclusions based on how a nominal independent variable

affects a continuous dependent variable. In this module we will learn how to determine this

effect using data collected from completely-randomized designs, randomized-block designs,

and repeated measures designs. In addition to analyzing the data collected using these

three designs, we will also learn which post-hoc analyses (both qualitative and quantitative)

can be used to further explore relationships within our data sets. We then use the results of

these analyses to address the initial research question.

Page 3: ModuleII Anova Outline

Module #2: Analysis of Variance

3

Learning Outcomes

At the completion of this module, you will be able to

Distinguish between a completely randomized design, a randomized block design, and a

repeated measures design;

Determine graphically whether the means associated with several independent groups

differ;

Use data collected from several independent populations to investigate the average

effect that different factors have on a dependent variable;

Graphically and through inference determine whether two factors interact;

Distinguish between factors which are fixed-effects factors and those which are random-

effects factors;

Remove the effect of a factor from an analysis by treating the factor as a blocking factor;

Use data collected from several dependent populations to investigate the average effect

that different factors have on a dependent variable; and

Draw public health conclusions based on the discussed statistical inference topics.

Page 4: ModuleII Anova Outline

Module #2: Analysis of Variance

4

Required Resources In this section, list all resources – readings, texts, web sites, videos, audio casts, etc.

You may also wish to include a “recommended resources” or “for further interest” section here,

but be sure to separate these from the required readings.

Page 5: ModuleII Anova Outline

Module #2: Analysis of Variance

5

Key Terms and Concepts Analysis of Variance

ANOVA

One-way ANOVA

Factor

Treatment

Multiple Testing Problem

Between Treatments Variance

Within Treatment Variance

Mean Square Error

Completely Randomized Design

Confidence Interval Plots

Post-hoc Analysis

Scheffé Test

Tukey Test

Two-way ANOVA

Main Effect

Interaction Effect

Fixed-effects Factor

Profile of a Factor

Profile Analysis

Additive Factors

Interacting Factors

Syntax Analysis

Random-effects Factor

Randomized Block Design

Page 6: ModuleII Anova Outline

Module #2: Analysis of Variance

6

Blocking Factors

Repeated Measures Design

Sphericity

Within-Subjects Main/Interaction Effects

Between-Subjects Main/Interaction Effects

Within-Between Subjects Interaction Effects

Page 7: ModuleII Anova Outline

Module #2: Analysis of Variance

7

Learning Material

Analysis of Variance (ANOVA) The material we have reviewed thus far relied on the fact that we had samples from one or two different populations. Quite often we wish to draw conclusions based on how a nominal independent variable (called a factor) affects a continuous dependent variable. To study the effect of the factor on the continuous dependent variable, the factor is divided into several different categories called levels (or treatments). -- eg. Suppose we were interested in studying how aspirin, propranolol, captopril, and diltiazem affect systolic blood pressure. The factor would be the category “medication” and the levels associated with the factor are the four different drugs. For Discussion: What issues/problems might arise when we attempt to analyze the data collected to implement the study in the above example?

Page 8: ModuleII Anova Outline

Module #2: Analysis of Variance

8

Answer: If we had only one or two levels, we could use a t-test to compare the means of the data grouped by the levels BUT if we have three or more levels, then a number of problems arise:

(1) We have trouble determining/interpreting the significance level; (2) As the number of levels increases, the number of t-tests we would have to implement

increases dramatically. For example, if we had only four levels, we might have to implement six two-sample t-tests. With each additional test we must complete, the probability of making a Type I error also increases (ie. Pr(Type I error)>(1-(1-α)n) where n is the number of tests to be implemented. This is referred to as the Multiple Testing Problem.

To avoid the above problems, we can use Analysis of Variance (ANOVA). We will begin our ANOVA discussion by looking at one-way ANOVA, where we investigate how one factor affects the dependent variable. (Two-way ANOVA would be used if two factors were believed to influence the dependent variable).

Page 9: ModuleII Anova Outline

Module #2: Analysis of Variance

9

One-Way Analysis of Variance

F -Test For Independent Samples For Discussion: How can looking at the variances yield any information about the population means of the individual treatments of data? Answer: Note that any particular observation (data point) can be decomposed as follows: Observation = grand mean + (treatment mean - grand mean) + (observation - treatment mean)

The formal model is, for the

j 'th observation from the

i 'th treatment:

xi, j (i, ) (xi, j i,).

If we expect there to be no difference in the population treatment means i,., then we would

expect the treatment mean minus grand mean to be essentially zero. The red “formula” above

illustrates how two types of variance can explain the deviation of the observation from the grand mean. (1) The first type of variation is represented by “treatment mean minus grand mean”, which is related to the variance between the treatments of data. Some refer to this variance as the

“Between Treatment Variance

sB2 ”. Most statistical packages do not directly compute the

Between Treatment Variance. Packages usually compute a quantity called the “Between Treatment Sum of Squares (

SSB ) ” and present its degrees of freedom (k-1) where k is the

number of treatments. Note that the “Mean Square Between Treatment Sum of Squares

(

MSB MST )” is the Between Treatment Variance

sB2 and is calculated using

MST MSB SSB

k 1 sB

2 .

(2) The second type of variation is represented by “observation minus treatment mean”, which is related to the variance within the treatments of data. Some refer to this variance as the “Within

Treatment Variance

sW2

”. Most statistical packages do not directly compute the Within

Treatment Variance. Packages usually compute a quantity called the “Total Residual Sum of Squares” or the “Error Sum of Squares (

SSW )” and present its degrees of freedom (N-k) where

N is the total number of observations and k is the number of treatments. Note that “Mean

Square Error Sum of Squares (

MSW MSE)” is the Within Treatment Variance

sW2

and is

calculated using

MSE MSW SSW

N k sW

2 .

Page 10: ModuleII Anova Outline

Module #2: Analysis of Variance

10

The question becomes, “How does one calculate

SSB and

SSW ?”

We could compute

SSB and

SSW as follows:

1) First compute the total for each of the

i samples using

Ti j1

ni

xij .

2) Now compute the total of all the observations from all the treatments, ie. compute the grand

total of the observations using

G i1

k

Ti .

3) Determine

ni (the number of observations in the

i 'th sample) and

N (the total number of

observations taken over all the treatments).

4) Compute the sum of the squares of all the observations using

i1

k

j1

ni

xij2 .

5) Compute

i1

k

Ti2

ni.

6) Then

SSB i1

k

Ti2

ni

G2

N

and

SSW i1

k

j1

ni

xij2

i1

k

Ti2

ni

7) Note the Total Sum of Squares (

SST ) is computed using

SST SSB SSW

The above calculations can be summarized in a table call an ANOVA table:

Page 11: ModuleII Anova Outline

Module #2: Analysis of Variance

11

The One-Way ANOVA

F -test compares

MSB and

MSW . If the

MSB is much larger than the

MSW , then we should conclude that at least one of the population treatment means differs from

the other population treatment means. Sometimes the original data for each treatment is not available. The original data has been summarized, that is the sample size, the sample mean, and the sample standard deviation are provided. All is not lost! We can still compute

SSB and

SSW as follows:

SSB i1

k

ni xi xg rand 2

,

SSW i1

k

ni 1 si2,

and

xg rand i1

k

ni xi

N,

where

xi is the sample mean of the

i 'th treatment,

si2 is the sample variance of the

i 'th

treatment, and

ni is the number of observations in the

i 'th treatment.

Page 12: ModuleII Anova Outline

Module #2: Analysis of Variance

12

When data is collected via an experiment in which the treatments are assigned randomly to the experimental units, we can analyze this data using ANOVA. This type of experimental design is referred to as the completely randomized experimental design. When our completely randomized experimental design assigns individuals to different levels/treatments for a single factor, we can analyze the corresponding collected data using a One-Way ANOVA F-Test. We will present our One-Way ANOVA

F -Tests using the same format that we learned in the previous module. Research Question: Population Declarations: The Hypotheses to be tested are:

H0 :

1 2 ...k .

H a : at least two of the population treatment means differ.

The underlying assumptions are:

1) the populations from which each of the random samples was taken must be normal; 2) the populations must have the same variances; 3) the samples must be independent of one another; and 4) the data must be collected via an experiment in which the treatments are assigned

randomly to the experimental units. The Significance level is:

The test statistic is:

),(W

B

MS

MSdF

where v=k-1 degrees of freedom in the numerator, d=N-k degrees of freedom in the denominator,

N is the total number of observations taken across all the treatments, and

k is the number of treatments. The

p -value is calculated using a software package.

The critical value

F (v,d) can be found in an appropriate

F -table.

Decision Rule is: If

F(,d)F(v,d), Reject

H0 . OTHERWISE do not reject

H0 .

or equivalently, if p-value

, Reject

H0 . OTHERWISE do not reject

H0 .

Conclusion:

Page 13: ModuleII Anova Outline

Module #2: Analysis of Variance

13

For practice: BFAHS, p. 331, q. 8.2.4. Gold et al. (A-5) investigated the effectiveness on smoking cessation of a nicotine patch, bupropion SR, or both, when co-administered with cognitive-behavioural therapy. Consecutive consenting patients (N=164) assigned themselves to one of three treatments according to personal preference: nicotine patch (NTP, n=13), bupropion SR (B, n=92), and buproprion SR plus nicotine patch (BNTP, n=59). At their first smoking cessation class, patients estimated the number of packs of cigarettes they currently smoked per day and the number of years they smoked. The “pack years” is the average number of packs the subject smoked per day multiplied by the number of years the subject had smoked. Using the 10% level of significance, analyze the data collected for this problem. The data can be downloaded from the Student Companion Sites link that appears on the website: http://ca.wiley.com/WileyCDA/WileyTitle/productCd-EHEP000107.html. The example is Question 4 of Section 2 of Chapter 8. For the sake of this example, assume that all the assumptions required to fully analyze the data are true. Solution:

Research Question:

Is there a difference in the average number of pack years based on the smoking cessation technique? Population Declarations:

Let Population 1 be the people who use the nicotine patch (NTP) to assist in smoking cessation

and NTP be the mean number of pack years associated with this population. Let Population 2 be the people who use buproprion SR (B) to assist in smoking cessation and

B be the mean number of pack years associated with this population. Let Population 3 be the people who use buproprion SR plus nicotine patch (BNTP) to assist in

smoking cessation and BNTP be the mean number of pack years associated with this population. Hypothesis to be tested:

H0: NTP=B=BNTP, ie. the true mean number of pack years of smokers for the three smoking cessation groups are equal. HA: not H0, that is at least two of the true mean number of pack years of smokers differ. Hypothesis Test to be used: One-Way ANOVA F-Test Assumptions required to implement the test:

1. Randomness: We assume that the sample was randomly selected 2. Independence: By the design of the experiment, the data are sampled from independent

populations. At this point, we will assume that the data sampled from within the same population are independent.

3. Normality: Each of the three populations must be normally distributed. We are told to assume this is true.

4. Equality of Variances: The three populations must have the same variances. We are told to assume this is true.

The Significance Level: 10.0

Page 14: ModuleII Anova Outline

Module #2: Analysis of Variance

14

The Test Statistic and corresponding p-value: From the ANOVA table below, the value of the test statistic is F(2,161)=5.878 and the associated p-value=0.003.

ANOVA

years

Sum of Squares df Mean Square F Sig.

Between Groups 14489.627 2 7244.814 5.878 .003

Within Groups 198442.245 161 1232.561

Total 212931.872 163

The Decision Rule: Since the p-value = 0.003 < 0.10 = , we reject H0. The Conclusion: At the 10% level of significance, with a p-value = 0.003, we have evidence to conclude that the true mean pack years for at least two of the populations differ. Because we rejected the null hypothesis, we would have to do a post-hoc analysis to determine how the means differed. We will learn how to implement this analysis in a few moments.

■ For Discussion: How would you actually test the normality assumption in the previous example? What hypotheses would you have to test? At what conclusions would you arrive after you implement the requisite hypothesis tests? Is the normality assumption actually true? Answer: To determine whether the data supports the normality assumption, we must perform a Test for Normality on each of the three populations individually. Because there are at least three observations sampled from each population, we can use the Shapiro-Wilk Test for Normality. To this end, we use the following table from SPSS.

Tests of Normality

Group

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Years nicotine patch .217 13 .094 .789 13 .005

bupropion SR .067 92 .200* .981 92 .204

nicotine patch and bupropion SR .086 59 .200* .978 59 .360

a. Lilliefors Significance Correction

Page 15: ModuleII Anova Outline

Module #2: Analysis of Variance

15

Tests of Normality

Group

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Years nicotine patch .217 13 .094 .789 13 .005

bupropion SR .067 92 .200* .981 92 .204

nicotine patch and bupropion SR .086 59 .200* .978 59 .360

a. Lilliefors Significance Correction

*. This is a lower bound of the true significance.

The three sets of hypotheses to be tested are

- H0,NTP: The number of pack years in Population 1 is normally distributed. HA,NTP: The number of pack years in Population 1 is not normally distributed.

- H0,B: The number of pack years in Population 2 is normally distributed. HA,B: The number of pack years in Population 2 is not normally distributed.

- H0,BNTP: The number of pack years in Population 3 is normally distributed. HA,BNTP: The number of pack years in Population 3 is not normally distributed.

Referring to the Test of Normality Table above, regarding Population 1 (the NTP group), since p-value = 0.005 < 0.10 = α, we reject H0,NTP; regarding Population 2 (the B group), since p-value = 0.204 > 0.10 = α, we do not reject H0,B; and regarding Population 3 (the BNTP group), since p-value = 0.360 > 0.10 = α, we do not reject H0,BNTP. At α = 0.10 level of significance, there is evidence to conclude that the number of pack years in Population 1 (the NTP group) is not normally distributed (p-value=0.005). At the same level of significance, there is not enough evidence to reject the assumptions that the number of pack years in Populations 2 (the B group) and 3 (the BNTP group) are normally distributed (with p-values of 0.204 and 0.360 respectively). Because one of the populations is not normally distributed, the assumption that all three of the populations are normally distributed has been violated.

■ For Discussion: How would you actually test the “equality of variances” assumption in the previous example? What hypotheses would you have to test? At what conclusions would you arrive after you implement the requisite hypothesis tests? Is the “equality of variances” assumption actually true? Answer:

To determine whether the data supports the “equality of variances” assumption, because there at least three observations were sampled from each population, we can perform a Levene’s

Page 16: ModuleII Anova Outline

Module #2: Analysis of Variance

16

Test for Equality of Variances to see if the variances in the number of pack years in the three populations are not all equal. To this end, we use the following table from SPSS.

Test of Homogeneity of Variances

Years

Levene Statistic df1 df2 Sig.

.690 2 161 .503

The hypotheses to be tested are

H0:

HA: not H0, that is at least two of the three population variances differ.

From the Test of Homogeneity of Variances Table above, since the p-value=0.503 > 0.10 = , we do not reject H0. Hence, at the 10% level of significance, with a p-value=0.503, we do not have enough evidence to reject the assumption that all three populations have the same variance.

■ For Discussion: When an

F -Test for ANOVA rejects the null hypothesis (as in the previous

example), how does one determine which pairs of means significantly differ? Two solutions to this question are presented next.

Page 17: ModuleII Anova Outline

Module #2: Analysis of Variance

17

A graphical method Confidence intervals can be used to visualize which pairs of means differ significantly. When forming the confidence intervals, be sure to use the confidence level associated with the significance level from the hypothesis test. Consider the following graph which displays the 90% confidence intervals computed based on the sample data for each of the three different smoking cessation techniques in the previous example.

For Discussion: How do we use the above graph to help determine the relationship between the different treatment means?... the different variances?

Page 18: ModuleII Anova Outline

Module #2: Analysis of Variance

18

Answer:

To determine whether it is reasonable that the population means are equal, we look at the corresponding confidence interval plot to identify whether the confidence intervals overlap. If all the plotted intervals overlap, you would not reject the assumption that all the true means are equal. If two of the intervals do not overlap, you would have evidence to conclude that those two true means differ. Referring to the previous 90% confidence interval plot for the average number of pack years associated with the three different smoking cessation techniques, the confidence interval for the NTP group (Population 1) does not overlap the confidence intervals for the B group (Population 2) and the BNTP group (Population 3). Therefore it would be reasonable to conclude that the true mean number of pack years for the NTP group differs from the true mean number of pack years for both the B and BNTP groups. Because the confidence intervals for the B and BNTP groups overlap, we do not have evidence to reject that the true mean numbers of pack years for the B and BNTP groups are equal. To determine whether it is reasonable that the population variances are equal, we look at the widths of the confidence intervals in the confidence interval plot. If the widths are approximately equal, then we do not have any evidence to reject that the true variances are all equal. For two of the intervals, if the larger width divided by the smaller width is greater than two then we have evidence to conclude that those two variances differ (provided each sample size is reasonable). Referring to the previous 90% confidence interval plot based on the sample data for each of the three different smoking cessation techniques, the confidence interval width for the NTP group (Population 1) is not two times larger than the widths of the B-group (Population 2) and the BNTP-group (Population 3) confidence intervals and the B-group confidence interval width is not two times larger than the width of the BNTP-group confidence interval, we would not have evidence to reject the assumption that the variances of the three groups equal.

■ The method we just discussed for determining which means differ is rather “nebulous”. We will discuss two quantitative methods for determining which of the means presented in a One-Way ANOVA

F -Test, if any, differ significantly. The first method we are going to discuss is called the “Scheffé Test”. The second method we will discuss is called the “Tukey Test”. The Scheffé Test and the Tukey Test are examples of “post hoc” analyses (i.e. analyses for which one did not ahead of time plan). Both tests can be used whenever the assumptions for the

F -Test for One-Way ANOVA are true.

The Scheffé Test To implement the Scheffé Test, we must compare the means, two at a time. For the smoking

cessation example, for example, we would have to compare the sample means NTPx with Bx ;

NTPx with BNTPx and Bx with BNTPx . The value of the test statistic for the Scheffé Test based on

Treatment

i and Treatment

j (for

i j ) is:

Page 19: ModuleII Anova Outline

Module #2: Analysis of Variance

19

ji nnw

jiji

Ss

xxkNkF

112

2, )(

),1(

where

x i and

x j are the sample means of Treatment

i and Treatment

j respectively, in and

jn are the size of the samples for Treatment

i and Treatment

j respectively, and

sw2 is the

“Within-the-treatment variance

MSW ” that we computed in the One-Way ANOVA

F -Test. The

critical value for the Scheffé Test is

FS(k 1,N k) (k 1)F (k 1,N k)

where

N is the total number of observations across all the samples,

k is the number of treatments, and

is the significance level used in the One-Way ANOVA

F -Test. Then there is a significant difference (at the

level of significance) between the means of Treatment

i and Treatment

j (for

i j ) if

FSi, j (k 1,N k)F

S (k 1,N k).

For practice: Use the Scheffé Test to determine which (if any) of the pairs of means in the

smoking cessation example differ significantly at the 10.0 level of significance.

Solution:

The hypotheses to be tested are:

- H0,NTPxB: NTPB

HA,NTPxB: NTP≠B

- H0,NTPxBNTP: NTPBNTP

HA,NTPxBNTP: NTP≠BNTP

- H0,BxBNTP: BBNTP

HA,BxBNTP: B≠BNTP

Multiple Comparisons

Dependent Variable:years

(I) group (J) group Mean Difference (I-J) Std. Error Sig. 90% Confidence Interval

Lower Bound Upper Bound

Scheffe NTP B -33.31939799 10.40239133 .007 -55.80316221 -10.83563378

BNTP -36.18122555 10.75654241 .004 -59.43045314 -12.93199797

B NTP 33.31939799 10.40239133 .007 10.83563378 55.80316221

BNTP -2.861827561 5.855617255 .888 -15.51817874 9.79452361

Page 20: ModuleII Anova Outline

Module #2: Analysis of Variance

20

BNTP NTP 36.18122555 10.75654241 .004 12.93199797 59.43045314

B 2.861827561 5.855617255 .888 -9.79452361 15.51817874

*. The mean difference is significant at the .1 level.

Referring to the Multiple Comparisons Table above:

regarding the NTP and B groups, because the p-value = 0.007 < 0.10 = , we reject H0,NTPxB;

regarding the NTP and BNTP groups, because the p-value = 0.004 < 0.10 = , we reject H0,NTPxBNTP; and

regarding the B and BNTP groups, because the p-value = 0.888 > 0.10 = , we do not reject H0,BxBNTP. Consequently, at the 0.10 level of significance, with a p-value=0.007, we have evidence to conclude that the true mean pack years for the buproprion and nicotine patch groups differ and, with a p-value=0.004, we have evidence to conclude that the true mean pack years for the nicotine patch and the nicotine patch/buproprion combination groups differ. At the same level of significance, with a p-value=0.888, we cannot reject the assumption that the true mean pack years for the buproprion and the nicotine patch/buproprion combination groups are equal.

■ NOTE: There are situations that arise in which the

F -Test ANOVA indicates that there is a significant difference between at least two of the means BUT the Scheffé Test fails to identify any significant differences in the pairs of means.

Page 21: ModuleII Anova Outline

Module #2: Analysis of Variance

21

The Tukey Test The Tukey Test can also be used after the One-Way ANOVA

F -Test has been completed to determine any pairwise differences between the means of the groups. The value for the Tukey test statistic for Population

i and

j is given by

q xi x j

sW2 /nh

,

where

nh k /(1/n11/n2 ...1/nk ). When the absolute value of the Tukey test statistic is

greater than the Tukey critical value (from an apriori standard table of values), there is a significant difference between the means corresponding to Population

i and Population

j.

For practice: Use the Tukey Test to determine which (if any) of the pairs of means in the

smoking cessation example differ significantly at the 10.0 level of significance.

Solution:

The hypotheses to be tested are:

- H0,NTPxB: NTPB

HA,NTPxB: NTP≠B

- H0,NTPxBNTP: NTPBNTP

HA,NTPxBNTP: NTP≠BNTP

- H0,BxBNTP: BBNTP

HA,BxBNTP: B≠BNTP

Multiple Comparisons

Dependent Variable:years

(I) group (J) group Mean Difference (I-J) Std. Error Sig. 90% Confidence Interval

Lower Bound Upper Bound

Tukey HSD NTP B -3.331939799E1 1.040239133E1 .005 -54.82150371 -11.81729228

BNTP -3.618122555E1 1.075654241E1 .003 -58.41537392 -13.94707719

B NTP 3.331939799E1 1.040239133E1 .005 11.81729228 54.82150371

BNTP -2.861827561 5.855617255 .877 -14.96559268 9.24193755

BNTP NTP 3.618122555E1 1.075654241E1 .003 13.94707719 58.41537392

B 2.861827561 5.855617255 .877 -9.24193755 14.96559268

*. The mean difference is significant at the .1 level.

Referring to the Multiple Comparisons table above,

Page 22: ModuleII Anova Outline

Module #2: Analysis of Variance

22

regarding the NTP and B groups, because the p-value = 0.005 < 0.10 = , we reject H0,NTPxB;

regarding the NTP and BNTP groups, because the p-value = 0.003 < 0.10 = , we reject H0,NTPxBNTP; and

regarding the B and BNTP groups, because the p-value = 0.877 > 0.10 = , we do not reject H0,BxBNTP. Consequently, at the 0.10 level of significance, with a p-value=0.005, we have evidence to conclude that the true mean pack years for the buproprion and nicotine patch groups differ and, with a p-value=0.003, we have evidence to conclude that the true mean pack years for the nicotine patch and the nicotine patch/buproprion combination groups differ. At the same level of significance, with a p-value=0.877, we cannot reject the assumption that the true mean pack years for the buproprion and the nicotine patch/buproprion combination groups are equal.

■ NOTE: In the situation where we only are making pairwise comparisons, the Tukey Test is preferred to the Scheffé Test. NOTE: There are other tests that possibly could be used.

Now Your Turn: BFAHS, p. 329, q. 8.2.2. Patients suffering from rheumatic diseases or

osteoporosis often suffer critical losses in bone mineral density (BMD). Alendronate is one medication prescribed to build or prevent further loss of BMD. Holcomb and Rothenberg (A-3) looked at 96 women taking alendronate to determine if a difference existed in the mean % change in BMD among five different primary diagnosis classifications. Group 1 patients were diagnosed with rheumatoid arthritis (RA). Group 2 patients were a mixed collection of patients with diseases including lupus, Wegener granulomatosis and polyarteritis, and other vascular diseases (LUPUS). Group 3 patients had polymyalgia rheumatica or temporal arthritis (PMRTA). Group 4 patients had osteoarthritis (OA) and group 5 patients having osteoporosis (O) with no other rheumatic diseases identified in the medical record. Completely analyze the above data at the 10% level of significance. The data can be found on the textbook website.

Page 23: ModuleII Anova Outline

Module #2: Analysis of Variance

23

Two-Way ANOVA with Interaction Suppose we are interested in studying the effects that two independent variables (or two factors) have on a single dependent variable. When we use a completely randomized experimental design to assign individuals to the different levels/treatments of the two factors, we can analyze the corresponding collected data using a Two-Way ANOVA F-Test. A Two-Way ANOVA allows a researcher to test whether each of the factors and their interaction have a statistically significant effect on the dependent variable.

All fixed effects factors Suppose, when designing an experiment, the levels of each factor are identified and fixed and the conclusions of any analysis is in relationship to these levels. Then the factors are fixed effects factors and we need to perform a “Fixed-effects Factors Two-Way ANOVA”. The model for two fixed-effects factors is

ijkijjiijky )(

where

and, for

i 1,...,a, j 1,...,n , ,, ji and ij)( are fixed unknown constants and

ijk is

a random, normally distributed variable with mean 0 and variance

2. Note

i1

a

i j1

n

j i1

a

()ij j1

n

()ij 0.

If we were to implement the test by hand, we would need to calculate: 1)

x, k 1

nai1

n

j1

a

xi, j,k

x j, k 1

ni1

n

xi, j ,k

x, 1

nabi1

n

j1

a

k1

b

xi, j,k

2) The sum of the squares for Factor A:

SSA nbj1

a

x j, x, 2

x j, 1

nbi1

n

k1

b

xi, j,k

Page 24: ModuleII Anova Outline

Module #2: Analysis of Variance

24

3) The sum of the squares for Factor B:

SSB nak1

b

x, k x, 2

4) The sum of the squares for the interaction:

2,,,,

11

kjkj

b

k

a

j

BA xxxxnSS

5) The sum of the squares for the within-group error term:

SSW i1

n

j1

a

k1

b

xi, j, k x j, k 2

6)

a is the number of levels of Factor A

7)

b is the number of levels of Factor B 8)

n is the number of subjects in each group

9)

MSA SSA

a1

10)

MSB SSB

b1

11) 11

ba

SSMS BA

BA

12)

MSW SSW

ab(n1)

13)

FA MSA

MSW with

a1 degrees of freedom in the numerator and

ab(n1) degrees of

freedom in the denominator

14)

FB MSB

MSW with

b1 degrees of freedom in the numerator and

ab(n1) degrees of

freedom in the denominator

15) W

BABA

MS

MSF

with

a1 b1 degrees of freedom in the numerator and

ab(n1)

degrees of freedom in the denominator

Page 25: ModuleII Anova Outline

Module #2: Analysis of Variance

25

In order to use ANOVA, there are three sets of hypotheses to be tested. We will present our Two-Way ANOVA

F -Tests using the same format that we learned in the previous section using the following template. Research Question: Population Declarations: Hypotheses to be tested:

AH ,0 : there is no difference between the true means of the dependent variable based

on the different levels of Factor A.

AaH , : not AH ,0 , that is, there is a difference between at least two of the true means of

the dependent variable based on the different levels of Factor A.

BH ,0 : there is no difference between the true means of the dependent variable based

on the different levels of Factor B.

BaH , : there is a difference between at least two of the true means of the dependent

variable based on the different levels of Factor B.

BAH ,0 : there is no interaction effect between Factor A and Factor B on the true means

of the dependent variable.

BAaH , : there is an interaction effect between Factor A and Factor B on the true means

of the dependent variable. The assumptions required to implement the test:

1) The populations from which each of the random samples was taken must be normal.

2) The populations must have the same variances. 3) The samples must be independent of one another. 4) The groups must be equal in sample size.

The Significance level:

The test statistics:

1)

FA MSA

MSW with

a1 degrees of freedom in the numerator and

ab(n1) degrees of freedom

in the denominator

Page 26: ModuleII Anova Outline

Module #2: Analysis of Variance

26

2)

FB MSB

MSW with

b1 degrees of freedom in the numerator and

ab(n1) degrees of freedom

in the denominator

3) W

BABA

MS

MSF

with

a1 b1 degrees of freedom in the numerator and

ab(n1)

degrees of freedom in the denominator, where

n is the number of subjects in each group,

a is the number of levels of Factor A, and

b is the number of levels of Factor B. Calculate p-value using technology.

There will be a critical value

F (v,d) for each of the above three test statistics which can be

found in an appropriate

F -table. The Decision Rule:

With respect to Factor A: If

FA F(a1, ab(n1)) , reject AH ,0 , otherwise do not reject .,0 AH

With respect to Factor B: If

FB F(b1, ab(n1)) , reject BH ,0 , otherwise do not reject .,0 BH

With respect to the interaction of Factors A and B: If ))1(,11( nabbaFF BA , reject

BAH ,0 , otherwise do not reject .,0 BAH

Equivalently:

With respect to Factor A: If p-value=

Pr[F FA], reject AH ,0 , otherwise do not reject .,0 AH

With respect to Factor B: If p-value=

Pr[F FB], reject BH ,0 , otherwise do not reject .,0 BH

With respect to the interaction of Factors A and B: If p-value=

Pr[F FAB ], reject BAH ,0 ,

otherwise do not reject .,0 BAH

Conclusion: We write our conclusion here.

Page 27: ModuleII Anova Outline

Module #2: Analysis of Variance

27

For Practice: A medical researcher wishes to test the effects of two different diets and two different exercise programs on the glucose level in a person's blood. The glucose is measured in milligrams per decilitre (mg/dl). Three subjects are randomly assigned to each group and the glucose levels are summarized in the table below. Analyze the researcher's data at the level of significance.

Solution: Research Question: Does one's diet and exercise program affect the glucose level in a person's blood?

Population Declarations:

Let Factor A be the Diet of the individuals. Let Level 1 of Factor A be Diet A and Level 2 of Factor A be Diet B. Let Factor B be the Exercise Program followed by the individuals. Let Level 1 of Factor B be Exercise Program 1 and Level 2 of Factor B be Exercise Program 2. Let Population 1 be the individuals who have Diet A and are on Exercise Program 1. Let Population 2 be the individuals who have Diet A and are on Exercise Program 2. Let Population 3 be the individuals who have Diet B and are on Exercise Program 1. Let Population 4 be the individuals who have Diet B and are on Exercise Program 2.

Hypothesis to be tested:

DietH ,0 : there is no difference between the true mean glucose blood levels associated with

the two different diets.

DietaH , : there is a difference between the true mean glucose blood levels associated with

the two different diets.

ExerciseH ,0 : there is no difference between the true mean glucose blood levels associated

with the two different exercise programs.

ExerciseaH , : there is a difference between the true mean glucose blood levels associated with

the two different exercise programs.

ExerciseDietH ,0 : there is no interaction effect between one's diet and exercise program on the

true mean glucose levels in the blood.

ExerciseDietaH , : there is an interaction effect between one's diet and exercise program on the

true mean glucose levels in the blood.

0.05

Diet A Diet B

Exercise 1 62 58

64 62

66 53

Exercise 2 65 83

68 85

72 91

Page 28: ModuleII Anova Outline

Module #2: Analysis of Variance

28

Hypothesis Test to be used: Two-Way ANOVA

Assumptions required to implement the hypothesis test:

1) Based on the Tests of Normality table below,

Tests of Normality

Kolmogorov-Smirnov

a Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Exercise1_dietA .175 3 . 1.000 3 1.000

Exercise1_dietB .196 3 . .996 3 .878

Exercise2_dietA .204 3 . .993 3 .843

Exercise2_dietB .292 3 . .923 3 .463

a. Lilliefors Significance Correction

regarding the Exercise1/diet A group (Population 1), since the p-value>0.999 > 0.05= α, we do not have evidence to reject that the glucose levels of individuals in population 1 are normally distributed; regarding the Exercise1/diet B group (Population 3), since the p-value= 0.878 > 0.05= α, we do not have evidence to reject that the glucose levels of individuals in population 3 are normally distributed; regarding the Exercise2/diet A group (Population 2), since the p-value= 0.843 > 0.05 = α, we do not have evidence to reject that the glucose levels of individuals in population 2 are normally distributed; and regarding the Exercise2/diet B group (Population 4), since the p-value= 0.463> 0.05 = α, we do not have evidence to reject that the glucose levels of individuals in population 4 are normally distributed. Therefore, at the α = 0.05 level of significance, there is no evidence to reject the assumptions that the glucose levels in each of populations 1 through 4 are normally distributed (with p-values of approximately 1.0, 0.843, 0.878, and 0.463 respectively). Hence we can continue with our analysis.

2) The populations must have the same variances.

Based on the following Test of Homogeneity of Variances table, the test statistic is L (3; 8) =0.633 with an associated p-value= 0.614.

Levene's Test of Homogeneity of Variancesa

Dependent Variable:glucose

F df1 df2 Sig.

.633 3 8 .614

Tests the null hypothesis that the error variance of the dependent variable is equal across groups.

a. Design: Intercept + exercise + diet + exercise * diet

Because the p-value= 0.614 > 0.05 = α, there is no evidence to reject the assumption that the four populations all have the same variance. Therefore, we can still carry out the two-way ANOVA F-test.

Page 29: ModuleII Anova Outline

Module #2: Analysis of Variance

29

3) The samples must be independent of one another. 4) The groups must be equal in sample size.

The Significance Level:

0.05 The Test Statistic and corresponding p-value:

From the Tests of Between-Subjects Effects Table below, the value of the test statistic FDiet(1,8)=7.562 and its associated p-value=0.025, the value of the test statistic FExercise(1,8)=60.500 and its associated p-value<0.001, and the value of the test statistic FDietxExercise(1,8)=32.895 and its associated p-value<0.001.

Tests of Between-Subjects Effects

Dependent Variable:Glucose

Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 1362.917a 3 454.306 33.652 .000

Intercept 57270.083 1 57270.083 4242.228 .000

Diet 102.083 1 102.083 7.562 .025

Exercise 816.750 1 816.750 60.500 .000

Diet * Exercise 444.083 1 444.083 32.895 .000

Error 108.000 8 13.500

Total 58741.000 12

Corrected Total 1470.917 11

a. R Squared = .927 (Adjusted R Squared = .899)

The Decision Rule: With respect to the interaction between the Diet and Exercise factors, since the p-

value<0.001<0.05=, we reject H0,DietxExercise.

With respect to the Diet factor, since the p-value=0.025<0.05=, we reject H0,Diet.

With respect to the Exercise factor, since the p-value<0.001<0.05=, we reject H0,Exercise. The Conclusion:

At the 5% level of significance, we have evidence to conclude that there is a difference between the true mean glucose blood levels associated with the two different diets (with a p-value=0.025), that there is a difference between the true mean glucose blood levels associated with the two different exercise programs (p-value<0.001), and that there is an interaction effect between one's diet and exercise program on the true mean glucose levels in the blood (p-value<0.001).

Normally we would now have to do a post-hoc analysis to determine how the means differ and what exactly is the interaction effect. We can compare the sample means and the corresponding confidence intervals to determine how the means differ but how do we determine the interaction effect? To help determine the interaction effect, we can carry out a profile analysis.

Page 30: ModuleII Anova Outline

Module #2: Analysis of Variance

30

The Profile of a Factor To graphically see how two factors interact (if at all), one can plot the means and corresponding confidence intervals for each level of one factor, in which the means are connected with a line, against the levels of a second factor. Based on the relationship between the resulting lines, one can determine how one factor affects the other factor. The interpretation of this plot is referred to as a profile analysis. A factor does not affect the response variable if the profile of the factor is horizontal for all combinations of levels of the other factors, that is there is no change in the response variable when you change the levels of the factor (true for all combinations of levels of the other factors); otherwise the factor is said to affect the response variable. If the graph looks as follows, then Factor A has no effect on Factor B:

Two factors are additive if the change in the response variable (for the different levels of one factor) is statistically the same for each of the levels of the other factor. If the graph looks as follows, then Factor A and Factor B are additive.

0

10

20

30

40

50

60

70

0 20 40 60

Factor A has no effect

A

B

Page 31: ModuleII Anova Outline

Module #2: Analysis of Variance

31

Two factors interact if the change in the response variable (for different levels of one factor) is not statistically the same for some of the levels of the other factor. In order to conclude that two factors interact, profiles of the first factor for different levels of the second factor cannot be statistically parallel. For example, if the profile looks as follows, we would conclude Factor A and Factor B interact.

NOTE: If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact. The testing continues for the lower order interactions and main effects of the factors that have not yet been determined to affect the response. For Practice: For the Diet and Exercise effect on glucose levels example, plot the profiles with

0

10

20

30

40

50

60

70

0 20 40 60

Additive Factors

A

B

0

10

20

30

40

50

60

70

0 20 40 60

Interacting Factors

A

B

Page 32: ModuleII Anova Outline

Module #2: Analysis of Variance

32

the two diet levels on the horizontal axis and the individual exercise levels as individual lines. Solution: The profile plot with the two diet levels on the horizontal axis and the individual exercise levels represented by individual lines is below.

To interpret the above plot, we need to know if the marginal means for each exercise program with respect to a particular diet differ, but from the above plot, we cannot determine this. A more detailed profile plot combines the information in the above plot with the associated confidence intervals for each true marginal mean. The plot below includes the profiles as depicted in the above plot with the estimated 95% confidence intervals for the true mean associated each diet/exercise program combination.

Page 33: ModuleII Anova Outline

Module #2: Analysis of Variance

33

In the above plot, we can see that the 95% confidence interval for the true mean glucose level of individuals on Diet 1 who participated in Exercise Program 1 significantly overlaps the 95% confidence interval for the true mean glucose level of individuals on Diet 1 who participated in Exercise Program 2. Hence statistically we cannot distinguish between these two means. BUT, the 95% confidence interval for the true mean glucose level of individuals on Diet 2 who participated in Exercise Program 1 does not overlap the 95% confidence interval for the true mean glucose level of individuals on Diet 2 who participated in Exercise Program 2. Hence statistically we would conclude that these two means differ. We therefore have evidence to believe that the slope of the line representing the impact of Exercise Program 1 as a function of Diet on the mean glucose level differs from the slope of the line representing the impact of Exercise Program 2 as a function of Diet on the mean glucose level. The upshot is we have evidence supporting that the diet and exercise program have an interaction effect on the true mean glucose level.

■ For Discussion: With reference to the above Profile Plot for the Diet-Exercise-Glucose example, describe the interaction effect between the diets and exercise programs on the true mean glucose level. Based on the statistical evidence displayed in the above plot, what diet/exercise combination would you recommend if the goal was to minimize blood-glucose

Page 34: ModuleII Anova Outline

Module #2: Analysis of Variance

34

levels? Answer: Because, with respect to Diet 1, the 95% confidence intervals for the true mean glucose levels of individuals on Exercise Programs 1 and 2 overlap, statistically we cannot distinguish between these two means and hence Exercise Programs 1 and 2 have no effect on the true mean glucose levels of individuals on Diet 1. With respect to Diet 2, the 95% confidence intervals for the true mean glucose levels of individuals on Exercise Programs 1 and 2 do not overlap; statistically these two means differ. Hence Exercise Programs 1 and 2 do impact the true mean glucose levels of individuals on Diet 2. In fact, it appears that the true mean glucose level of individuals on Diet 2 and Exercise Program 1 is lower than the true mean glucose level of individuals on Diet 2 and Exercise Program 2. The above discussion eliminates the Diet 2/Exercise Program 2 combination as a candidate for minimizing the mean glucose level. Statistically (because the confidence intervals in the above plot overlap), we cannot distinguish the true mean glucose levels of individuals on Diet 1/Exercise Program 1, Diet 1/Exercise Program 2, and Diet 2/Exercise Program 1 from one-and-another. As a result, we might be tempted to recommend any of the three combinations. BUT, note the widths of the confidence intervals. There was the least variation in the Diet 1/Exercise Program 1 blood glucose levels. The question becomes is the variation in the Diet 1/Exercise Program 1 data statistically smaller than the variation in the data associated with the Diet 1/Exercise Program 2 and Diet 2/Exercise Program 1? Since it appears that the confidence interval associated with Diet 2/Exercise Program 1 is more than twice as wide as the confidence interval associated with Diet 1/Exercise Program 1, the blood glucose levels of individuals in the Diet 2/Exercise Program 1 statistically vary more than the blood glucose levels of individuals in the Diet 1/Exercise Program 1. Hence we would recommend Diet 1/Exercise Program 1 over Diet 2/Exercise Program 1 if we want to minimize blood-glucose levels. Similarly one can argue that Diet 1/Exercise Program 1 should be chosen over Diet 1/Exercise Program 2 if we want to minimize blood-glucose level.

Page 35: ModuleII Anova Outline

Module #2: Analysis of Variance

35

Understanding and Interpreting Two-Way ANOVA If one or more of the overall effects is significant, several “post hoc” procedures can be conducted. Which procedure to conduct depends on which effects were significant. A researcher may want to look at the interaction effects in place of, or possibly in addition to, the simple main effects. The simplest interaction effects analysis involve four means and are referred to as tetrad contrasts. Tetrad contrasts involve whether the differences in population means between two levels of one factor are the same across two levels of a second factor. If the interaction effect is not significant, the focus of the analysis turns to the main effects. Depending on which effects were significant, a researcher may want to compare the differences in the population means among levels of the first factor for each level of the second factor; the differences in the populations means among the levels of the second factor for each level of the first factor, or both. To illustrate the techniques described above, we will use the contrived data sets taken from Using SPSS for Windows and Macintosh. Now Your Turn: Suppose a researcher is interested in two methods of note-taking strategies and the effect of these methods on the overall GPAs of first year college students. After randomly selecting 30 men and 30 women to participate, 10 women and 10 men are randomly assigned to Method 1; 10 men and 10 women are randomly assigned to Method 2; and the remaining 10 men and 10 women were assigned to Method 3 (the control method). During the first term, individuals in the Method 1 and 2 groups were given daily instruction on the corresponding note-taking method while the Method 3 group received no note-taking instructions. The GPAs for all the participants were recorded at the ends of the second and third term. Analyze the data collected at the 5% level of significance. Use Lesson 25 Data File 1. At this point, do not test the underlying assumptions for the Two-Way ANOVA. For Discussion: We were told to not test the assumptions required for the above conclusion to be valid, but:

(1) How would we test the normality assumption?

(2) How would we test the equality of variances assumption?

Understanding how the Main Effect Influences the Mean

Suppose we are interested in exploring further the average effect of one’s gender and note-

taking ability. We could use Syntax Programming to assist us. We illustrate Syntax

Programming in the answer to the following discussion question.

For Discussion: Based on the analysis of the Gender—Note-taking Method—GPA example, one's gender and the note-taking method individually impact the mean GPA but the interaction between gender and the method does not impact the mean GPA. Which simple main effects (the effects of the levels within in a factor) should be analyzed? Answer:

Page 36: ModuleII Anova Outline

Module #2: Analysis of Variance

36

The simple main effects that need to be analyzed can be determined from the following SPSS syntax. In order to get SPSS to carry out an analysis of the simple main effects, follow the following instructions: 1) Analyze->General Linear Model->Univariate

2) Paste (you are now in the SPSS syntax editor) 3) Delete everything you see EXCEPT THE FIRST THREE LINES 4) On the fourth line begin typing:

/lmatrix 'men vs women within Method 1' gender*method 1 0 0 -1 0 0 gender 1 -1 /lmatrix 'men vs women within Method 2' gender*method 0 1 0 0 -1 0 gender 1 -1 /lmatrix 'men vs women within Control' gender*method 0 0 1 0 0 -1 gender 1 -1 /lmatrix 'method within men' gender*method 1 -1 0 0 0 0 method 1 -1 0; gender*method 0 1 -1 0 0 0 method 0 1 -1; gender*method 1 0 -1 0 0 0 method 1 0 -1 /lmatrix 'method within women' gender*method 0 0 0 1 -1 0 method 1 -1 0; gender*method 0 0 0 0 1 -1 method 0 1 -1; gender*method 0 0 0 1 0 -1 method 1 0 -1.

5) Highlight all the syntax, click RUN and then click SELECTION Note: The syntax in the blue ellipse also generates the p-values to individually test the following sets of hypotheses:

(1) The first line of syntax generates the p-value to test H0: Male,Method1 Male,Method2 against HA: Male,Method1 Male,Method2;

(2) The second line of syntax generates the p-value to test H0: Male,Method2 Male,Control against HA: Male,Method2 Male,Control; and

(3) The third line of syntax generates the p-value to test H0: Male,Method1 Male,Control against HA: Male,Method1 Male,Control.

Further note: The syntax in the red ellipse also generates the p-values to individually test the following sets of hypotheses:

(1) The first line of syntax generates the p-value to test H0: Female,Method1 Female,Method2 against HA: Female,Method1 Female,Method2;

(2) The second line of syntax generates the p-value to test H0: Female,Method2 Female,Control against HA: Female,Method2 Female,Control; and

(3) The third line of syntax generates the p-value to test

H0: Female,Method1 Female,Control against HA: Female,Method1 Female,Control.

Comment [MLS1]: Generates p-value

to test H0: Male,Method1 Female,Method1 against HA: Male,Method1 Female,Method1

Comment [MLS2]: Generates p-value

to test H0: Male,Method2 Female,Method2 against HA: Male,Method2 Female,Method2

Comment [MLS3]: Generates p-value

to test H0: Male,Control Female,Control against HA: Male,Control Female,Control

Comment [MLS4]: Generates p-value

to test H0: Male,Method1 Male,Method2 Male,Control against HA: not H0

Comment [MLS5]: Generates p-value

to test H0: Female,Method1 Female,Method2 Female,Control against HA: not H0

Page 37: ModuleII Anova Outline

Module #2: Analysis of Variance

37

The results of the above syntax are:

Custom Hypothesis Tests #1

Contrast Results (K Matrix)a

Contrast Dependent

Variable Change in GPA

L1 Contrast Estimate .165

Hypothesized Value 0

Difference (Estimate - Hypothesized) .165

Std. Error .081

Sig. .047

95% Confidence Interval for

Difference

Lower Bound .002

Upper Bound .328

a. Based on the user-specified contrast coefficients (L') matrix: men vs women within

Method 1

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .136 1 .136 4.130 .047

Error 1.780 54 .033

Custom Hypothesis Tests #2

Contrast Results (K Matrix)a

Contrast Dependent

Variable Change in GPA

L1 Contrast Estimate .335

Hypothesized Value 0

Difference (Estimate - Hypothesized) .335

Std. Error .081

Sig. .000

95% Confidence Interval for

Difference

Lower Bound .172

Upper Bound .498

Comment [MLS6]: p-value to test

H0: Male,Method1 Female,Method1 against HA: Male,Method1 Female,Method1

Comment [MLS7]: note the same p-value that was used to test

H0: Male,Method1 Female,Method1 against HA: Male,Method1 Female,Method1

Comment [MLS8]: p-value to test

H0: Male,Method2 Female,Method2 against HA: Male,Method2 Female,Method2

Page 38: ModuleII Anova Outline

Module #2: Analysis of Variance

38

Contrast Results (K Matrix)a

Contrast Dependent

Variable Change in GPA

L1 Contrast Estimate .335

Hypothesized Value 0

Difference (Estimate - Hypothesized) .335

Std. Error .081

Sig. .000

95% Confidence Interval for

Difference

Lower Bound .172

Upper Bound .498

a. Based on the user-specified contrast coefficients (L') matrix: men vs women within

Method 2

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .561 1 .561 17.023 .000

Error 1.780 54 .033

Custom Hypothesis Tests #3

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate .060

Hypothesized Value 0

Difference (Estimate - Hypothesized) .060

Std. Error .081

Sig. .463

95% Confidence Interval for

Difference

Lower Bound -.103

Upper Bound .223

Comment [MLS8]: p-value to test H0: Male,Method2 Female,Method2 against

HA: Male,Method2 Female,Method2

Comment [MLS9]: Note the same p-value that was used to test

H0: Male,Method2 Female,Method2 against HA: Male,Method2 Female,Method2

Comment [MLS10]: p-value to test

H0: Male,Control Female,Control against HA: Male,Control Female,Control

Page 39: ModuleII Anova Outline

Module #2: Analysis of Variance

39

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate .060

Hypothesized Value 0

Difference (Estimate - Hypothesized) .060

Std. Error .081

Sig. .463

95% Confidence Interval for

Difference

Lower Bound -.103

Upper Bound .223

a. Based on the user-specified contrast coefficients (L') matrix: men vs women within

Control

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .018 1 .018 .546 .463

Error 1.780 54 .033

Custom Hypothesis Tests #4

Contrast Results (K Matrix)a

Contrast Dependent

Variable Change in GPA

L1 Contrast Estimate -.305

Hypothesized Value 0

Difference (Estimate - Hypothesized) -.305

Std. Error .081

Sig. .000

95% Confidence Interval for

Difference

Lower Bound -.468

Upper Bound -.142

Comment [MLS10]: p-value to test

H0: Male,Control Female,Control against HA: Male,Control Female,Control

Comment [MLS11]: note the same p-value that was used to test

H0: Male,Control Female,Control against HA: Male,Control Female,Control

Comment [MLS12]: p-value to test

H0: Male,Method1 Male,Method2 against HA: Male,Method1 Male,Method2

Page 40: ModuleII Anova Outline

Module #2: Analysis of Variance

40

L2 Contrast Estimate .475

Hypothesized Value 0

Difference (Estimate - Hypothesized) .475

Std. Error .081

Sig. .000

95% Confidence Interval for

Difference

Lower Bound .312

Upper Bound .638

L3 Contrast Estimate .170

Hypothesized Value 0

Difference (Estimate - Hypothesized) .170

Std. Error .081

Sig. .041

95% Confidence Interval for

Difference

Lower Bound .007

Upper Bound .333

a. Based on the user-specified contrast coefficients (L') matrix: method within men

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast 1.158 2 .579 17.573 .000

Error 1.780 54 .033

Custom Hypothesis Tests #5

Contrast Results (K Matrix)a

Contrast Dependent

Variable

Change in GPA

L1 Contrast Estimate -.135

Hypothesized Value 0

Difference (Estimate - Hypothesized) -.135

Std. Error .081

Sig. .102

95% Confidence Interval for

Difference

Lower Bound -.298

Upper Bound .028

Comment [MLS13]: p-value to test

H0: Male,Method2 Male,Control against HA: Male,Method2 Male,Control

Comment [MLS14]: p-value to test H0: Male,Method1 Male,Control against

HA: Male,Method1 Male,Control

Comment [MLS15]: p-value to test

H0: Male,Method1 Male,Method2 Male,Control against HA: not H0

Comment [MLS16]: p-value to test

H0: Female,Method1 Female,Method2 against HA: Female,Method1 Female,Method2

Page 41: ModuleII Anova Outline

Module #2: Analysis of Variance

41

L2 Contrast Estimate .200

Hypothesized Value 0

Difference (Estimate - Hypothesized) .200

Std. Error .081

Sig. .017

95% Confidence Interval for

Difference

Lower Bound .037

Upper Bound .363

L3 Contrast Estimate .065

Hypothesized Value 0

Difference (Estimate - Hypothesized) .065

Std. Error .081

Sig. .427

95% Confidence Interval for

Difference

Lower Bound -.098

Upper Bound .228

a. Based on the user-specified contrast coefficients (L') matrix: method within women

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .208 2 .104 3.158 .050

Error 1.780 54 .033

For Discussion: What conclusions would we make using the test statistic values and p-values presented in the tables associated with the above Custom Hypothesis Tests 1 through 5? Answer: From the Test Results table in the Custom Hypothesis Test #1, since the p-

value=0.047<0.05=we reject the hypothesis that Male,Method1 Female,Method1. Therefore at

the 5% level of significance, we have evidence to conclude that the true mean GPA of male students who used note-taking method 1 differs from the true mean GPA of female students who used note-taking method 1 (p-value=0.047). In fact, from the hypothesized difference reported in the Custom Hypothesis Test #1, we would conclude that the true mean GPA of male students who used note-taking method 1 is greater than the true mean GPA of female students who used note-taking method 1. From the Test Results table in the Custom Hypothesis Test #2, since the p-

value<0.001<0.05=we reject the hypothesis that Male,Method2 Female,Method2. Therefore at the 5% level of significance, we have evidence to conclude that the true mean GPA of male students who used note-taking method 2 differs from the true mean GPA of female students

Comment [MLS17]: p-value to test

H0: Female,Method2 Female,Control against HA: Female,Method2 Female,Control

Comment [MLS18]: p-value to test

H0: Female,Method1 Female,Control against HA: Female,Method1 Female,Control

Comment [MLS19]: p-value to test

H0: Female,Method1 Female,Method2 Female,Control against HA: not H0

Page 42: ModuleII Anova Outline

Module #2: Analysis of Variance

42

who used note-taking method 2 (p-value<0.001). In fact, from the hypothesized difference reported in the Custom Hypothesis Test #2, we would conclude that the true mean GPA of male students who used note-taking method 2 is greater than the true mean GPA of female students who used note-taking method 2. From the Test Results table in the Custom Hypothesis Test #3, since the p-

value=0.463>0.05=we do not reject the hypothesis that Male,Control Female,Control. Therefore

at the 5% level of significance, we do not have evidence to reject the hypothesis that the true mean GPA of male students who used the control note-taking method equals the true mean GPA of female students who used the control note-taking method (p-value=0.463). From the Test Results table in the Custom Hypothesis Test #4, since the p-

value<0.001<0.05=we reject the hypothesis that Male,Method1 Male,Method2 Male,Control.

Therefore at the 5% level of significance, we have evidence to conclude that at least two of the true mean GPAs of male students associated with the three note-taking methods differ (p-value<0.001). To determine how the true mean male GPA differs with respect to the note-taking methods, we refer to the rows in the Custom Hypothesis Test #4 table labelled L1, L2, and L3.

- The row labelled L1 displays the information used to test the null hypothesis:

Male,Method1 Male,Method2. Because the p-value<0.001<0.05=we have evidence to

conclude that the true mean GPA of male students who use note-taking Method 1 differs from the true mean GPA of male students who use note-taking Method 2 (p-value < 0.001). In fact, from the hypothesis difference reported in row L1, there is evidence to conclude that the true mean GPA of male students who use note-taking Method 1 is less than the true mean GPA of male students who use note-taking Method 2.

- The row labelled L2 displays the information used to test the null hypothesis:

Male,Method2 Male,Control. Because the p-value<0.001<0.05=we have evidence to

conclude that the true mean GPA of male students who use note-taking Method 2 differs from the true mean GPA of male students who use the control note-taking method (p-value < 0.001). In fact, from the hypothesis difference reported in row L2, there is evidence to conclude that the true mean GPA of male students who use note-taking Method 2 is greater than the true mean GPA of male students who use the control note-taking method.

- The row labelled L3 displays the information used to test the null hypothesis:

Male,Method1 Male,Control. Because the p-value=0.041<0.05=we have evidence to

conclude that the true mean GPA of male students who use note-taking Method 1 differs from the true mean GPA of male students who use the control note-taking method (p-value < 0.001). In fact, from the hypothesis difference reported in row L2, there is evidence to conclude that the true mean GPA of male students who use note-taking Method 1 is greater than the true mean GPA of male students who use the control note-taking method.

From the Test Results table in the Custom Hypothesis Test #5, since the p-value=0.05=we

do not reject the hypothesis that Female,Method1 Female,Method2 Female,Control. Therefore at the

Page 43: ModuleII Anova Outline

Module #2: Analysis of Variance

43

5% level of significance, we have do not have evidence to refute that the true mean GPAs of female students associated with the three note-taking methods are all equal (p-value=0.05). Consequently there is no need to consider the information presented in rows L1, L2, and L3 of this table.

■ For Discussion: Based on the above analysis, which note-taking method (if any) would you recommend to male students and which note-taking method (if any) would you recommend to female students? Be sure to justify your response. Answer: Because the true mean GPA of male students who used note-taking Method 2 is statistically greater than the true mean GPAs of male students who used either the control note-taking method or note-taking Method 1, on average the GPAs of male students are greater when note-taking Method 2 is used. Consequently I would recommend note-taking Method 2 to the male students. From the discussion relevant to the Custom Hypothesis Test #5 table, it might appear that it does not matter statistically which note-taking method we recommend for female students. But, because the least (statistically significant) difference between the true mean GPAs of male and female students occurs for note-taking Method 2, I also would recommend note-taking Method 2 to the female students.

For Discussion: Suppose we wanted to, pairwise, investigate how the means associated with two levels of one factor differ with respect to the levels of another factor. How would we compare how the different note-taking methods compare for a specific gender? Answer:

In order to get SPSS to generate the required output for this analysis, follow the following instructions: 1) Analyze->General Linear Model->Univariate

2) Paste (you are now in the SPSS syntax editor) 3) Delete everything you see EXCEPT THE FIRST THREE LINES 4) On the fourth line begin typing:

/lmatrix 'Method 1 vs. Method 2 within men' gender*method 1 -1 0 0 0 0 method 1 -1 0 /lmatrix 'Method 1 vs. Control within men' gender*method 1 0 -1 0 0 0 method 1 0 -1 /lmatrix 'Method 2 vs. Control within men' gender*method 0 1 -1 0 0 0 method 0 1 -1 /lmatrix 'Method 1 vs. Method 2 within women' gender*method 0 0 0 1 -1 0 method 1 -1 0

Comment [MLS20]: Generates p-

value to test H0: Male,Method1 Male,Method2 against HA: Male,Method1 Male,Method2

Comment [MLS21]: Generates p-

value to test H0: Male,Method1 Male,Control against HA: Male,Method1 Male,Control

Comment [MLS22]: Generates p-

value to test H0: Male,Method2 Male,Control against HA: Male,Method2 Male,Control

Comment [MLS23]: Generates p-

value to test H0: Female,Method1 Female,Method2 against HA: Female,Method1 Female,Method2

Page 44: ModuleII Anova Outline

Module #2: Analysis of Variance

44

/lmatrix 'Method 1 vs. Control within women' gender*method 0 0 0 1 0 -1 method 1 0 -1 /lmatrix 'Method 2 vs. Control within women' gender*method 0 0 0 0 1 -1 method 0 1 -1.

5) Highlight all the syntax, click RUN and then click SELECTION

Comment [MLS24]: Generates p-

value to test H0: Female,Method1 Female,Control against HA: Female,Method1 Female,Control

Comment [MLS25]: Generates p-

value to test H0: Female,Method2 Female,Control against HA: Female,Method2 Female,Control

Page 45: ModuleII Anova Outline

Module #2: Analysis of Variance

45

The results of the above code are:

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate -.305

Hypothesized Value 0

Difference (Estimate - Hypothesized) -.305

Std. Error .081

Sig. .000

95% Confidence Interval for

Difference

Lower Bound -.468

Upper Bound -.142

a. Based on the user-specified contrast coefficients (L') matrix: Method 1 vs. Method 2

within men Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .465 1 .465 14.111 .000

Error 1.780 54 .033

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate .170

Hypothesized Value 0

Difference (Estimate - Hypothesized) .170

Std. Error .081

Sig. .041

95% Confidence Interval for

Difference

Lower Bound .007

Upper Bound .333

Page 46: ModuleII Anova Outline

Module #2: Analysis of Variance

46

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate .170

Hypothesized Value 0

Difference (Estimate - Hypothesized) .170

Std. Error .081

Sig. .041

95% Confidence Interval for

Difference

Lower Bound .007

Upper Bound .333

a. Based on the user-specified contrast coefficients (L') matrix: Method 1 vs. Control

within men Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .144 1 .144 4.384 .041

Error 1.780 54 .033

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate .475

Hypothesized Value 0

Difference (Estimate - Hypothesized) .475

Std. Error .081

Sig. .000

95% Confidence Interval for

Difference

Lower Bound .312

Upper Bound .638

Page 47: ModuleII Anova Outline

Module #2: Analysis of Variance

47

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate .475

Hypothesized Value 0

Difference (Estimate - Hypothesized) .475

Std. Error .081

Sig. .000

95% Confidence Interval for

Difference

Lower Bound .312

Upper Bound .638

a. Based on the user-specified contrast coefficients (L') matrix: Method 2 vs. Control

within men

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast 1.128 1 1.128 34.224 .000

Error 1.780 54 .033

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate -.135

Hypothesized Value 0

Difference (Estimate - Hypothesized) -.135

Std. Error .081

Sig. .102

95% Confidence Interval for

Difference

Lower Bound -.298

Upper Bound .028

Page 48: ModuleII Anova Outline

Module #2: Analysis of Variance

48

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate -.135

Hypothesized Value 0

Difference (Estimate - Hypothesized) -.135

Std. Error .081

Sig. .102

95% Confidence Interval for

Difference

Lower Bound -.298

Upper Bound .028

a. Based on the user-specified contrast coefficients (L') matrix: Method 1 vs. Method 2

within women

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .091 1 .091 2.764 .102

Error 1.780 54 .033

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate .065

Hypothesized Value 0

Difference (Estimate - Hypothesized) .065

Std. Error .081

Sig. .427

95% Confidence Interval for

Difference

Lower Bound -.098

Upper Bound .228

Page 49: ModuleII Anova Outline

Module #2: Analysis of Variance

49

Contrast Results (K Matrix)a

Contrast

Dependent

Variable Change in GPA

L1 Contrast Estimate .065

Hypothesized Value 0

Difference (Estimate - Hypothesized) .065

Std. Error .081

Sig. .427

95% Confidence Interval for

Difference

Lower Bound -.098

Upper Bound .228

a. Based on the user-specified contrast coefficients (L') matrix: Method 1 vs. Control

within women

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .021 1 .021 .641 .427

Error 1.780 54 .033

Contrast Results (K Matrix)a

Contrast Dependent

Variable Change in GPA

L1 Contrast Estimate .200

Hypothesized Value 0

Difference (Estimate - Hypothesized) .200

Std. Error .081

Sig. .017

95% Confidence Interval for

Difference

Lower Bound .037

Upper Bound .363

Page 50: ModuleII Anova Outline

Module #2: Analysis of Variance

50

Contrast Results (K Matrix)a

Contrast Dependent

Variable Change in GPA

L1 Contrast Estimate .200

Hypothesized Value 0

Difference (Estimate - Hypothesized) .200

Std. Error .081

Sig. .017

95% Confidence Interval for

Difference

Lower Bound .037

Upper Bound .363

a. Based on the user-specified contrast coefficients (L') matrix: Method 2 vs. Control

within women

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .200 1 .200 6.067 .017

Error 1.780 54 .033

For Discussion: Compare the output generated by this syntax to the corresponding output generated by the previous syntax. Do you notice any similarities amongst the output?

Answer:

The information presented in the output generated by the syntax in this section is also included as some of the output generated by the syntax in the previous section.

Understanding how an interaction affects the mean

We will use the following example to demonstrate how the interaction between two factors affects the mean. For Practice: Now, let us redo the previous example but with Lesson 25 Data File 2.

Solution:

Page 51: ModuleII Anova Outline

Module #2: Analysis of Variance

51

Research Question: Does one's gender and note-taking ability affect the GPA of a student?

Population Declarations: Let Factor A be the gender of the individuals. Let Level 1 of Factor A be male. Let Level 2 of Factor A be female. Let Factor B be the note-taking method used by the individuals. Let Level 1 of Factor B be Method 1. Let Level 2 of Factor B be Method 2. Let Level 3 of Factor B be Control group.

Let Population 1 be the male students who use note-taking Method 1. Let Male,Method1 be the

true mean GPA of Population 1.

Let Population 2 be the male students who use note-taking Method 2. Let Male,Method2 be the

true mean GPA of Population 2.

Let Population 3 be the male students who use the control note-taking method. Let Male,Control be the true mean GPA of Population 3.

Let Population 4 be the female students who use note-taking Method 1. Let Female,Method1 be the true mean GPA of Population 4.

Let Population 5 be the female students who use note-taking Method 2. Let Female,Method2 be the true mean GPA of Population 5. Let Population 6 be the female students who use the control note-taking method. Let

Female,Control be the true mean GPA of Population 6.

Hypotheses to be tested:

GenderH ,0 : there is no difference between the true mean GPAs based on gender.

GenderaH , : there is a difference between the true mean GPAs based on gender.

MethodH ,0 : there is no difference between the true mean GPAs based on the three note-

taking methods.

MethodaH , : there is a difference between at least two of the true mean GPAs based on the

three note-taking methods.

MethodGenderH ,0 : there is no interaction effect between one's gender and note-taking ability

on the true mean GPAs.

MethodGenderaH , : there is an interaction effect between one's gender and note-taking ability

on the true mean GPAs.

Hypothesis Test to be used: Two-Way ANOVA for Two Fixed Effects Factors

Assumptions required to implement the hypothesis test: We are told to not test the assumptions. (Recall the assumptions are:

1) The populations from which each of the random samples was taken must be normal.

Page 52: ModuleII Anova Outline

Module #2: Analysis of Variance

52

2) The populations must have the same variances. 3) The samples must be independent of one another. 4) The groups must be equal in sample size.)

The Significance Level: 05.0

The Test Statistic and corresponding p-value:

From the Tests of Between-Subjects Effects Table below, the value of the test statistic FGender(1,54)=0.612 and its associated p-value=0.436, the value of the test statistic FMethod(2,54)=17.809 and its associated p-value<0.001, and the value of the test statistic FGenderxMethod(2,54)=10.543 and its associated p-value<0.001.

Tests of Between-Subjects Effects

Dependent Variable:Change in GPA

Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 1.889a 5 .378 11.463 .000

Intercept 4.931 1 4.931 149.582 .000

gender .020 1 .020 .612 .438

method 1.174 2 .587 17.809 .000

gender * method .695 2 .348 10.543 .000

Error 1.780 54 .033

Total 8.600 60

Corrected Total 3.669 59

a. R Squared = .515 (Adjusted R Squared = .470)

The Decision Rule:

Regarding Gender, since the p-value=0.438>0.05=, we do not reject H0,Gender.

Regarding the Note-taking Method, since the p-value<0.001<0.05=, we reject H0,Method. Regarding the interaction between Gender and Note-taking Method, since the p-

value<0.001<0.05=, we reject H0,GenderxMethod. The Conclusion:

At the 5% level of significance, we have evidence to conclude that there is a difference between the true mean GPAs based on the three note-taking methods (p-value<0.001) and that there is an interaction effect between one’s gender and his/her note-taking method on the true mean GPAs (p-value<0.001). At the same level of significance, there is no evidence to conclude that there is a difference in at least two of the true mean GPAs based on one’s gender (p-value=0.438).

Page 53: ModuleII Anova Outline

Module #2: Analysis of Variance

53

For Discussion: We were told to not test the assumptions required for the above conclusion to

be valid, but:

(1) How would we test the normality assumption? Solution: We have to test that each of the six populations are normally distributed, that is we have to implement six normality tests. The required Tests of Normality Table generated by SPSS is included below.

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic Df Sig. Statistic df Sig.

M1 .161 10 .200* .964 10 .830

M2 .145 10 .200* .967 10 .857

MC .132 10 .200* .979 10 .958

F1 .224 10 .170 .869 10 .096

F2 .289 10 .018 .906 10 .254

FC .214 10 .200* .936 10 .512

a. Lilliefors Significance Correction

*. This is a lower bound of the true significance.

Referring to Population 1 (Line M1 in the Tests of Normality Table above): since the p-

value=0.830>0.05=, we do not have evidence to reject the assumption that the GPAs

of men who use note-taking method 1 are normally distributed.

Referring to Population 2 (Line M2 in the Tests of Normality Table above): since the p-

value=0.857>0.05=, we do not have evidence to reject the assumption that the GPAs

of men who use note-taking method 2 are normally distributed.

Referring to Population 3 (Line MC in the Tests of Normality Table above): since the p-

value=0.958>0.05=, we do not have evidence to reject the assumption that the GPAs

of men who use the control note-taking method are normally distributed.

Referring to Population 4 (Line F1 in the Tests of Normality Table above): since the p-

value=0.096>0.05=, we do not have evidence to reject the assumption that the GPAs

of women who use note-taking method 1 are normally distributed.

Page 54: ModuleII Anova Outline

Module #2: Analysis of Variance

54

Referring to Population 5 (Line F2 in the Tests of Normality Table above): since the p-

value=0.254>0.05=, we do not have evidence to reject the assumption that the GPAs

of women who use note-taking method 2 are normally distributed.

Referring to Population 6 (Line FC in the Tests of Normality Table above): since the p-

value=0.512>0.05=, we do not have evidence to reject the assumption that the GPAs

of women who use the control note-taking method are normally distributed.

Since we do not have evidence to reject any one of the six populations is normally distributed, we cannot conclude that the normality assumption has been violated.

(2) How would we test the equality of variances assumption?

Solution:

We have to test the hypotheses:

The required Test for the Homogeneity of Variances Table is included below.

Test of Homogeneity of Variances

Change in GPA

Levene Statistic df1 df2 Sig.

.575 5 54 .719

Referring to the Test of Homogeneity of Variances Table above, the value of the test

statistic computed using Levene’s test for the equality of variances is L (5; 54) =0.575

and its associated p-value= 0.719. Since p-value= 0.719 > 0.05 = α, there is no

evidence to reject the assumption that all six populations have the same variance.

For Discussion: In this past example, we saw that the interaction between gender and the

note-taking method significantly impacts the mean GPA. Just how do they impact the mean GPA? How would one conduct an interaction comparison after finding a significant interaction? Answer: We use syntax programming. The relevant instructions are given below. 1) Analyze->General Linear Model->Univariate

Page 55: ModuleII Anova Outline

Module #2: Analysis of Variance

55

2) Paste (you are now in the SPSS syntax editor) 3) Delete everything you see EXCEPT THE FIRST THREE LINES 4) On the fourth line begin typing:

/lmatrix '(Method 1 vs. Method 2) for men vs (Method 1 vs. Method 2) for women' gender*method 1 -1 0 -1 1 0 /lmatrix '(Method 1 vs. Control) for men vs (Method 1 vs. Control) for women' gender*method 1 0 -1 -1 0 1 /lmatrix '(Method 2 vs. Control) for men vs (Method 2 vs. Control) for women' gender*method 0 1 -1 0 -1 1.

5) Highlight all the syntax, click RUN and then click SELECTION

The results of the above code are:

Custom Hypothesis Tests #1

Contrast Results (K Matrix)a

Contrast

Dependent Variable

Change in GPA

L1 Contrast Estimate -.170

Hypothesized Value 0

Difference (Estimate - Hypothesized) -.170

Std. Error .115

Sig. .145

95% Confidence Interval for

Difference

Lower Bound -.400

Upper Bound .060

a. Based on the user-specified contrast coefficients (L') matrix: (Method 1 vs. Method 2) for men vs

(Method 1 vs. Method 2) for women

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .072 1 .072 2.192 .145

Error 1.780 54 .033

Comment [MLS26]: Generates p-value to test

H0: Male,Method1 Male,Method2= Female,Method1

Female,Method2 against HA: Male,Method1 Male,Method2 Female,Method1 Female,Method2

Comment [MLS27]: Generates p-value to test

H0: Male,Method1 Male,Control= Female,Method1

Female,Control against HA: Male,Method1 Male,Control Female,Method1 Female,Control

Comment [MLS28]: Generates p-value to test

H0: Male,Method2 Male,Control= Female,Method2

Female,Control against HA: Male,Method2 Male,Control Female,Method2 Female,Control

Comment [MLS29]: p-value to test

H0: Male,Method1 Male,Method2= Female,Method1

Female,Method2 against HA: Male,Method1 Male,Method2 Female,Method1 Female,Method2

Page 56: ModuleII Anova Outline

Module #2: Analysis of Variance

56

Custom Hypothesis Tests #2

Contrast Results (K Matrix)a

Contrast

Dependent Variable

Change in GPA

L1 Contrast Estimate .105

Hypothesized Value 0

Difference (Estimate - Hypothesized) .105

Std. Error .115

Sig. .365

95% Confidence Interval for

Difference

Lower Bound -.125

Upper Bound .335

a. Based on the user-specified contrast coefficients (L') matrix: (Method 1 vs. Control) for men vs

(Method 1 vs. Control) for women

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .028 1 .028 .836 .365

Error 1.780 54 .033

Custom Hypothesis Tests #3

Contrast Results (K Matrix)a

Contrast

Dependent Variable

Change in GPA

L1 Contrast Estimate .275

Hypothesized Value 0

Difference (Estimate - Hypothesized) .275

Std. Error .115

Sig. .020

95% Confidence Interval for Lower Bound .045

Comment [MLS30]: p-value to test H0: Male,Method1 Male,Control= Female,Method1

Female,Control against HA: Male,Method1 Male,Control Female,Method1 Female,Control

Page 57: ModuleII Anova Outline

Module #2: Analysis of Variance

57

Difference Upper Bound .505

a. Based on the user-specified contrast coefficients (L') matrix: (Method 2 vs. Control) for men vs

(Method 2 vs. Control) for women

Test Results

Dependent Variable:Change in GPA

Source Sum of Squares df Mean Square F Sig.

Contrast .189 1 .189 5.736 .020

Error 1.780 54 .033

■ NOTE: The F-tests implemented in these tetrad comparisons do not control Type I Error.

For Discussion: What conclusions would you make based on the p-values presented in the above three custom hypothesis test tables? NOTE: Although learning syntax programming is part of the course, it is not the focus of the course. If a student chooses to “skip” the syntax programming, a student can still attain an excellent (high 80’s) to exceptional (low 90’s) grade in the course. If a student chooses to “skip” the syntax programming portion of an analysis when it is warranted, s/he must state in his/her solution that s/he recognizes that syntax programming is required to complete the analysis but s/he has chosen not to do the analysis. It is expected that a student will know when syntax programming should be used.

Comment [MLS31]: p-value to test

H0: Male,Method2 Male,Control= Female,Method2

Female,Control against HA: Male,Method2 Male,Control Female,Method2 Female,Control

Page 58: ModuleII Anova Outline

Module #2: Analysis of Variance

58

The One-Way and Two-Way ANOVA discussions were based on an experimental design referred to as the Completely Randomized Design. We formed a set of treatment combinations

(based on the k factors) and randomly assigned a fixed number n of experimental units to each treatment combination. There are other experimental designs. We are now going to study two other experimental designs: first the Randomized Block Design and then the Repeated Measures Design.

The Randomized Block Design Suppose a researcher is interested in how several treatments affect a continuous response variable. The treatments may be levels of a single factor or they may be combinations of levels of several factors. Suppose we have a fixed number of experimental units available to which we

need to apply the different treatments (say t treatments). A Randomized Block Design divides

the group of experimental units into a fixed number of homogeneous groups (say b groups) each of the same size t . These groups are called the blocks. The treatments are then randomly

assigned to the experimental units in each block so that there is a different treatment to each experimental unit in each block. The Model for a Randomized Block Experiment is

yij i j ij, #

where i 1, . . . , t; j 1, . . . ,b; y ij represents the observation receiving the i'th treatment in

the j’th block; is the grand mean; i is the effect of the i'th treatment; j is the effect of the j'th

block; and ij is the random error. A randomized block experiment is assumed to be a two-factor experiment. The factors are blocks and treatments. There is one observation per cell. It is assumed that there is no interaction between blocks and treatments. The degrees of freedom for the interaction is used to estimate error. If the treatments are defined in terms of two or more factors, the treatment Sum of Squares can be partitioned into a component due to the Main Effects and a component due to the Interaction of the Treatment Factors. In SPSS, we must implement a custom model in which we do not include the interaction of the treatment and blocking factors. In the resulting ANOVA table we only are concerned with the information presented regarding the treatment factor. The assumptions underlying this test are:

1. each observation is an independent random sample of size one from each of the tbpopulations;

2. each of these tb populations is normally distributed;

3. all tb populations have the same variance (but possibly different means); and

4. the block and treatment effects are additive (ie. there is no interaction effect between the blocking and treatment factors). Violation of this assumption can provide misleading results if the largest mean is more than 50% greater than the smallest mean. As long as the largest mean is less than 50% greater than the smallest mean, the results of this test are still valid. In essence, we assign one factor as a blocking factor when we wish to eliminate the effect of

Page 59: ModuleII Anova Outline

Module #2: Analysis of Variance

59

this factor in our analysis. For Practice: BFAHS, P.345, q 3.2.4. The nursing supervisor in a local health department wished to study the influence of the time of day on the length of home visits by the nursing staff. It was thought that individual differences among nurses might be large so the supervisor wished to eliminate the effect of the nurse in her analysis. The nursing supervisor collected the following data.

Length of Home Visit by Time of Day

Nurse

Early

Morning

Late

Morning

Early

Afternoon

Late

Afternoon

A 27 28 30 23

B 31 30 27 20

C 35 38 34 30

D 20 18 20 14

Analyze the supervisor’s data at the level of significance. For each nurse, assume that the four different lengths of home visits are independent. Assume all required assumptions are valid. The data can be found in SPSS format at http://bcs.wiley.com/he-bcs/Books?action=index&itemId=0470105828&bcsId=5023. Solution:

Research Question: Does the length of a home visit depend on the time of day after eliminating the effect of the specific nurse? Population Declarations: Let Factor B, the blocking factor, be the specific nurse studied, which we will denote “nurse”. Let Block 1 be Nurse A. Let Block 2 be Nurse B. Let Block 3 be Nurse C. Let Block 4 be Nurse D. Let Factor A be the time of day a home was visited which we will denote “t.o.d.” Let Level 1 of Factor A be the Early Morning visit. Let Level 2 of Factor A be the Late Morning visit. Let Level 3 of Factor A be the Early Afternoon visit. Let Level 4 of Factor A be the Late Afternoon visit. Let Population 1 be the lengths of Nurse A’s early morning visits. Let Population 2 be the lengths of Nurse A’s late morning visits. Let Population 3 be the lengths of Nurse A’s early afternoon visits. Let Population 4 be the lengths of Nurse A’s late afternoon visits. Let Population 5 be the lengths of Nurse B’s early morning visits. Let Population 6 be the lengths of Nurse B’s late morning visits. Let Population 7 be the lengths of Nurse B’s early afternoon visits. Let Population 8 be the lengths of Nurse B’s late afternoon visits. Let Population 9 be the lengths of Nurse C’s early morning visits.

0.05

Page 60: ModuleII Anova Outline

Module #2: Analysis of Variance

60

Let Population 10 be the lengths of Nurse C’s late morning visits. Let Population 11 be the lengths of Nurse C’s early afternoon visits. Let Population 12 be the lengths of Nurse C’s late afternoon visits. Let Population 13 be the lengths of Nurse D’s early morning visits. Let Population 14 be the lengths of Nurse D’s late morning visits. Let Population 15 be the lengths of Nurse D’s early afternoon visits. Let Population 16 be the lengths of Nurse D’s late afternoon visits. Hypothesis to be tested: H0,t.o.d.: there are no differences among the true mean lengths of home visits based on the time

of day. HA,t.o.d.: there is a difference between at least two of the true mean lengths of home visits based

on the time of day. Hypothesis Test to be used: Two-way ANOVA for a random block design Assumptions required to implement the hypothesis test: 1) Each observation is an independent random sample of size one from each of the 16

populations. 2) Each of these 16 populations is normally distributed. 3) All 16 populations have the same variance. 4) The blocking factor (i.e. the nurse) and the treatment factor (i.e. the time of day) are

additive. We are told that we can assume all the above hold.

The Significance Level: 0.05 The value of the test statistic and the p-value: From the Tests of Between-Subjects Effects Table below, the required value of the test statistic

Ft.o.d(3,9)=11.667 and its associated p-value=0.002.

Tests of Between-Subjects Effects

Dependent Variable:length

Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 655.875a 6 109.313 30.684 .000

Intercept 11289.063 1 11289.063 3168.860 .000

t.o.d. 124.688 3 41.563 11.667 .002

nurse 531.188 3 177.063 49.702 .000

Error 32.063 9 3.563

Total 11977.000 16

Corrected Total 687.938 15

Page 61: ModuleII Anova Outline

Module #2: Analysis of Variance

61

Tests of Between-Subjects Effects

Dependent Variable:length

Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 655.875a 6 109.313 30.684 .000

Intercept 11289.063 1 11289.063 3168.860 .000

t.o.d. 124.688 3 41.563 11.667 .002

nurse 531.188 3 177.063 49.702 .000

Error 32.063 9 3.563

Total 11977.000 16

Corrected Total 687.938 15

a. R Squared = .953 (Adjusted R Squared = .922)

The Statistical Decision: Since ,002.005.0 valuep we reject H0.

Conclusion: At the 0.05 level of significance, we have evidence to conclude that, based on the time of day of a visit, there is a difference between at least two of the true mean lengths of time of a home-visit.

■ For Discussion: Based on our conclusion in the above example, we would need to implement

a post-hoc analysis to provide a meaningful answer for the example’s research question. (1) What post-hoc analysis should we perform?

(2) Based on this post-hoc analysis, what would be an appropriate answer for the research question?

■ For Discussion: In the above example, we were told to assume that the relevant populations were normally distributed. In practice, how would we test the requisite normality assumptions? Solution: For Populations 1 through 16, we would need to individually test the hypothesis

H0,i: Population i is normally distributed. against

HA,i: Population i is not normally distributed. where i iterates through the set of integers {1, 2, …, 16}. Because we have only sampled one data point from each of the sixteen populations (i.e. one data point per nurse/time of day combination), we are unable to test whether each of the populations from which we have sampled is normally distributed. The best we can test is whether the associated marginal distributions are normally distributed.

Page 62: ModuleII Anova Outline

Module #2: Analysis of Variance

62

Note that all the marginal distributions being normally distributed does not imply that the underlying joint distributions are also normally distributed. However we can say that, if we have evidence that a marginal distribution is not normally distributed, then we have evidence that the underlying joint distributions are also not normally distributed. The resulting eight sets of hypotheses that we would need to individually test are: 1) H0,NurseA: The distribution of the possible lengths of Nurse A’s home visits (independent of

the time of day) is normal. HA,NurseA: The distribution of the possible lengths of Nurse A’s home visits (independent of the time of day) is not normal.

2) H0,NurseB: The distribution of the possible lengths of Nurse B’s home visits (independent of the time of day) is normal. HA,NurseB: The distribution of the possible lengths of Nurse B’s home visits (independent of the time of day) is not normal.

3) H0,NurseC: The distribution of the possible lengths of Nurse C’s home visits (independent of the time of day) is normal. HA,NurseC: The distribution of the possible lengths of Nurse C’s home visits (independent of the time of day) is not normal.

4) H0,NurseD: The distribution of the possible lengths of Nurse D’s home visits (independent of the time of day) is normal. HA,NurseD: The distribution of the possible lengths of Nurse D’s home visits (independent of the time of day) is not normal.

5) H0,EarlyMorning: The distribution of the possible lengths of early morning home visits (independent of the nurse) is normal. HA,EarlyMorning: The distribution of the possible lengths of early morning home visits (independent of the nurse) is not normal.

6) H0,LateMorning: The distribution of the possible lengths of late morning home visits (independent of the nurse) is normal. HA,LateMorning: The distribution of the possible lengths of late morning home visits (independent of the nurse) is not normal.

7) H0,EarlyAfternoon: The distribution of the possible lengths of early afternoon home visits (independent of the nurse) is normal. HA,EarlyAfternoon: The distribution of the possible lengths of early afternoon home visits (independent of the nurse) is not normal.

8) H0,LateAfternoon: The distribution of the possible lengths of late afternoon home visits (independent of the nurse) is normal. HA, LateAfternoon: The distribution of the possible lengths of late afternoon home visits (independent of the nurse) is not normal.

The Tests of Normality table below contains the values of the Shapiro-Wilk Test Statistic and its

associated p-values required to test the hypotheses sets labelled 1 through 4.

Tests of Normality

nurse

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

length A .250 4 . .953 4 .734

Page 63: ModuleII Anova Outline

Module #2: Analysis of Variance

63

B .250 4 . .878 4 .329

C .220 4 . .980 4 .900

D .260 4 . .827 4 .161

a. Lilliefors Significance Correction

At the α = 0.05 level of significance, there is no evidence to reject the assumption that the distributions of the possible lengths of home visits for Nurse A (p-value =0.734), Nurse B (p-value =0.329), Nurse C (p-value=0.900), and Nurse D (p-value=0.161) are respectively normal.

The Tests of Normality table below contains the values of the Shapiro-Wilk Test Statistic and its

associated p-values required to test the hypotheses sets labelled 5 through 8.

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

early_morning .173 4 . .981 4 .909

late_morning .226 4 . .976 4 .880

early_afternoon .200 4 . .978 4 .889

late_afternoon .175 4 . .995 4 .983

a. Lilliefors Significance Correction

At the α = 0.05 level of significance, there is no evidence to reject the assumption that the distributions of the possible lengths of home visits for the early morning shift (p-value =0.909), the late morning shift (p-value =0.880), the early afternoon shift (p-value=0.889), and the late afternoon shift (p-value=0.983) are respectively normal.

Because, at the α = 0.05 level of significance, we do not have evidence to conclude that any of the marginal distributions are not normally distributed, we do not have evidence to reject the assumptions that each of the original sixteen populations is normally distributed.

■ For Discussion: In the above example, we were told to assume that the data was sampled from populations with the same variance. In practice, how would we test this “equality of variances” assumption? Solution: Due to the insufficient sample size for each nurse/time of day combination, we cannot directly test whether the data was sampled from populations with the same variance. The best we can do is test whether the marginal variances for the nurses equal and independently the marginal variances for the time of the day equal.

Page 64: ModuleII Anova Outline

Module #2: Analysis of Variance

64

Suppose we wish to determine whether the variances of the lengths of visits associated with

each of the Nurses (independent of the time of day) are equal. The table for the corresponding

test is given below.

Test of Homogeneity of Variances

Length

Levene Statistic df1 df2 Sig.

.446 3 12 .725

Because p-value=0.725>0.05= α, we do not have evidence to reject the assumption that the

variances of the lengths of visits associated with each Nurse (independent of the time of day)

are equal. Suppose we wish to determine whether the variances of the lengths of visits associated with

each of the times of day (independent of the nurse) are equal. The table for the corresponding

test is given below.

Test of Homogeneity of Variances

Length

Levene Statistic df1 df2 Sig.

.067 3 12 .976

Because p-value=0.976>0.05= α, we do not have evidence to reject the assumption that the

variances of the lengths of visits associated with each of the times of day (independent of the

nurse) are equal. Because we do not have evidence to reject the assumptions that the marginal variances for the lengths of visits associated with each of the times of day (independent of the nurse) and independently the marginal variances for the variances of the lengths of visits associated with each Nurse (independent of the time of day) are equal, we do not have evidence to reject the assumption that the data was drawn from populations with the same variances.

■ For Discussion: In the above example, we were told to assume that the blocking factor (i.e. the nurse) and the treatment factor (i.e. the time of day) are additive. In practice, how would we test this “additivity” assumption? Solution:

Page 65: ModuleII Anova Outline

Module #2: Analysis of Variance

65

We can use Tukey’s Test for Non-Additivity. Note that this test can only be used if there is a single observation from each population. If there is more than one observation from each population, you can implement the test for an interaction effect that we talked about in the previous section. For Tukey’s Test for Non-Additivity, the null hypothesis is H0: the two factors are additive. The alternative hypothesis is HA: the two factors are non-additive. To test at the 5% level of significance whether the Nurse and Time of Day factors in the previous example are additive, we need to generate the ANOVA with Friedman’s Test and Tukey’s Test for Non-additivity table using SPSS. The hypothesis we are testing is: H0: the Nurse and Time of Day factors are additive. HA: The Nurse and Time of Day factors are non-additive.

From the ANOVA table below, since p-value=0.777>0.05=, we do not have evidence to reject

H0. Hence, at the 5% level of significance, we do not have evidence to reject the assumption

that the Nurse and Time of Day factors are additive (p-value=0.777).

ANOVA with Friedman's Test and Tukey's Test for Nonadditivity

Sum of Squares df Mean Square

Friedman's Chi-

Square Sig

Between People 531.188 3 177.063

Within People Between Items 124.688 3 41.563 9.545 .023

Residual Nonadditivity .340a 1 .340 .086 .777

Balance 31.722 8 3.965

Total 32.063 9 3.563

Total 156.750 12 13.063

Total 687.938 15 45.863

Grand Mean = 26.5625

a. Tukey's estimate of power to which observations must be raised to achieve additivity = .759.

■ Suppose after completing the above analysis, we found that the additivity assumption was violated. This does not necessarily immediately imply that the results of our Two-way ANOVA for a random block design are invalid. To determine if violation of the additivity assumption will lead to misleading results, we compute

the means for each block (time of day) and for each treatment (nurse). These means are

presented in the following two tables: the first presents the mean length of visit for each nurse

and the second presents the mean length of visit for each time of day.

Descriptive Statistics

Page 66: ModuleII Anova Outline

Module #2: Analysis of Variance

66

Dependent Variable:length

nurse Mean Std. Deviation N

A 27.00000000 2.943920289 4

B 27.00000000 4.966554809 4

C 34.25000000 3.304037934 4

D 18.00000000 2.828427125 4

Total 26.56250000 6.772185762 16

Descriptive Statistics

Dependent Variable:length

tod Mean Std. Deviation N

EA 27.75000000 5.909032634 4

EM 28.25000000 6.396613687 4

LA 21.75000000 6.652067348 4

LM 28.50000000 8.225975120 4

Total 26.56250000 6.772185762 16

Note, in the above two tables, that the smallest mean is 18.000 and the largest mean is 34.25.

Since 34.25>18+18/2=27, we would conclude that a violation of the additivity assumption would

lead to misleading results. Consequently we would have to analyze this data using some other

technique.

■ Now Your Turn: A study is made to determine the impact of the humidity level on the growth of different molds. Three species of mold commonly found in homes were grown under four assigned humidity levels. The percentages of the surface area covered by mold one week after inoculation have been recorded in the table below.

Mold

Humidity

Average 30% 50% 70% 90%

A 39.0 33.1 33.8 33.0 34.7 B 36.9 27.2 29.7 28.5 30.6 C 27.4 29.2 26.7 30.9 28.6

Average 34.4 29.8 30.1 30.8 31.3

Page 67: ModuleII Anova Outline

Module #2: Analysis of Variance

67

(1) At the 5% level of significance, determine the average effect the humidity level has on the percentage of a container’s surface area covered by mold, controlling for the type of mold. You may assume all required assumptions hold.

(2) W were told to assume that the relevant populations were normally distributed. In practice, how would we test the requisite normality assumptions?

(3) We were told to assume that the data was sampled from populations with the same variance. In practice, how would we test this “equality of variances” assumption?

(4) We were told to assume that the blocking factor (i.e. the type of mold) and the treatment factor (i.e. the humidity level) are additive. In practice, how would we test this “additivity” assumption?

(5) Suppose after completing the above analysis in (4), we found that the additivity assumption was violated. This does not necessarily immediately imply that the results of our Two-way ANOVA for a random block design are invalid.

What happens if the observations are somehow correlated. All is not lost. If, on independent objects, we take different measurements on each object as the objects are exposed to different conditions, then we can analyze the data using the technique in the next section.

Page 68: ModuleII Anova Outline

Module #2: Analysis of Variance

68

One-Way ANOVA F -Test for Dependent Samples (Repeated Measures) In a Repeated Measures Design, we have experimental units that may be grouped according to one or several factors (ie the grouping factors). Then, on each experimental unit, we have several measurements (the repeated measures) not just a single measurement. The repeated measures may be taken at combinations of levels for one or several factors (the repeated measures factors). The assumptions for the one-way repeated measures design we will use are: 1. the subjects are a simple random sample; 2. each observation is an independent simple random sample of size one from tn (t is the

number of treatments and n is the number of subjects) normal populations; 3. the tn populations have the same variance;

4. the t treatments are fixed; 5. there is no interaction between the treatments and the subjects; and 6. there is a correlation among the repeated measures and these correlations are all equal. Note (3) and (6) combined is referred to as sphericity. A set of populations satisfying (3) and (6) is said to be spherical. To test the sphericity assumption, we will use Mauchly’s Test for Sphericity. For this test, the null hypothesis is H0: the tn populations are spherical, and the alternative hypothesis is HA: the tn populations are not spherical. For the results of Mauchly’s Test for Sphericity to be valid, each of the tn populations must be normally distributed. Consequently one should check that each of the populations is normally distributed prior to implementing Mauchly’s Test for Sphericity. For Practice: An experimenter was interested in how the level of a certain enzyme changed in 15 randomly selected cardiac patients after open heart surgery. For each patient, the enzyme was measured immediately after surgery (Day 0); one day after surgery (Day 1); two days after surgery (Day 2); and one week after surgery (Day 7). The data is summarized in the below table.

Subject Day 0 Day 1 Day 2 Day 7 Subject Day 0 Day 1 Day 2 Day 7

1 108 63 45 42 9 106 65 49 49

2 112 75 56 52 10 110 70 46 47

3 114 75 51 46 11 120 85 60 62

4 129 87 69 69 12 118 78 51 56

5 115 71 52 54 13 110 65 46 47

6 122 80 68 68 14 132 92 73 63

7 105 71 52 54 15 127 90 73 68

8 117 77 54 61

At the 5% level of significance, determine if the underlying populations are spherical. Assume the necessary populations are normally distributed. Solution:

Page 69: ModuleII Anova Outline

Module #2: Analysis of Variance

69

The population of interest is the set of cardiac patients who have open-heart surgery. On this

population, we take four measurements. Let population 1 be the possible enzyme levels of the

cardiac patients immediately after their open-heart surgery, population 2 be the possible

enzyme levels of the cardiac patients 24 hours after their Day 0 enzyme level was measured,

population 3 be the possible enzyme levels of the cardiac patients 48 hours after their Day 0

enzyme level was measured, and population 4 be the possible enzyme levels of the cardiac

patients 7 days after their Day 0 enzyme level was measured. Then hypothesis to be tested is

H0: The set of the four populations is spherical.

Ha: The set of the four populations is not spherical.

From the Mauchly’s Test of Sphericity table below, since the p-value=0.687>0.05=α, we do not

reject H0. Consequently, at the 5% level of significance, we do not have enough evidence to

reject that the four populations are spherical (p-value=0.687). Hence we do not have evidence

to conclude that the sphericity assumption does not hold.

Mauchly's Test of Sphericityb

Measure:MEASURE_1

Within Subjects

Effect

Mauchly's

W

Approx. Chi-

Square df Sig.

Epsilona

Greenhouse-

Geisser

Huynh-

Feldt

Lower-

bound

days .784 3.089 5 .687 .863 1.000 .333

Now that we know how to test the sphericity assumption associated with a One –Way ANOVA for Repeated Measures test, we will demonstrate an analysis of Repeated Measures data. For Practice: An experimenter was interested in how the level of a certain enzyme changed in 15 randomly selected cardiac patients after open heart surgery. For each patient, the enzyme was measured immediately after surgery (Day 0); one day after surgery (Day 1); two days after surgery (Day 2); and one week after surgery (Day 7). The data is summarized in the below table.

Subject Day 0 Day 1 Day 2 Day 7 Subject Day 0 Day 1 Day 2 Day 7

1 108 63 45 42 9 106 65 49 49

2 112 75 56 52 10 110 70 46 47

Page 70: ModuleII Anova Outline

Module #2: Analysis of Variance

70

3 114 75 51 46 11 120 85 60 62

4 129 87 69 69 12 118 78 51 56

5 115 71 52 54 13 110 65 46 47

6 122 80 68 68 14 132 92 73 63

7 105 71 52 54 15 127 90 73 68

8 117 77 54 61

At the 5% level of significance, analyze the above data. Assume all the assumptions required to implement the hypothesis test are true. Solution: Research Question:

Is there a difference in the true mean enzyme levels of subjects based on the amount of time

has passed after open-heart surgery?

Population Declarations:

The population of interest is the set of cardiac patients who have open-heart surgery. Let Day0

be the true mean enzyme level of the cardiac patients immediately after their open-heart

surgery, Day1 be the true mean enzyme level of the cardiac patients 24 hours after their Day 0

enzyme level was measured, Day2 be the true mean enzyme level of the cardiac patients 48

hours after their Day 0 enzyme level was measured, and Day7 be the true mean enzyme level of

the cardiac patients 7 days after their Day 0 enzyme level was measured.

Hypothesis to be tested:

H0: The true mean enzyme levels are equal based on the amount of time passed after open-

heart surgery. (i.e. µday 0 = µday 1= µday 2= µday 7)

HA: At least two of the true mean enzyme levels based on the amount of time passed after

open-heart surgery differ.

Hypothesis Test to be used: One –Way ANOVA for Repeated Measures.

Assumptions required to implement the hypothesis test:

1. the subjects are a simple random sample;

2. each observation is an independent simple random sample of size one from tn (t is the

number of treatments and n is the number of subjects) normal populations;

3. the tn populations have the same variance;

Page 71: ModuleII Anova Outline

Module #2: Analysis of Variance

71

4. the t treatments are fixed;

5. there is no interaction between the treatments and the subjects;

6. there is a correlation among the repeated measures and these correlations are all equal.

We are told to assume that all the above assumptions are true.

The Significance Level: α=0.05

The Test Statistic and corresponding p-value:

Tests of Within-Subjects Effects

Measure:MEASURE_1

Source

Type III Sum of

Squares df Mean Square F Sig.

days Sphericity Assumed 36282.267 3 12094.089 1301.662 .000

Greenhouse-Geisser 36282.267 2.588 14021.994 1301.662 .000

Huynh-Feldt 36282.267 3.000 12094.089 1301.662 .000

Lower-bound 36282.267 1.000 36282.267 1301.662 .000

Error(days) Sphericity Assumed 390.233 42 9.291

Greenhouse-Geisser 390.233 36.225 10.772

Huynh-Feldt 390.233 42.000 9.291

Lower-bound 390.233 14.000 27.874

From the Sphericity Assumed row in the above Tests of Within-Subjects Effects table, the test

statistic is F(3.42)=1301.662 with an associated p-value < 0.001.

The Decision Rule: Since the p-value <0.001 <0.05=α, we reject H0 , i.e. we reject µday 0 = µday 1=

µday 2= µday 7.

Conclusion: At the 5% level of significance, assuming that all assumptions to implement the

One-Way ANOVA for the Repeated Measures hold, there is evidence to conclude that at least

two of the true mean level enzymes based on the amount of time that has passed after open-

heart surgery are different (p-value <0.001).

Page 72: ModuleII Anova Outline

Module #2: Analysis of Variance

72

For Discussion: In the above example, because we rejected the null hypothesis, we need to

conduct a post-hoc analysis to determine which means actually differed. What post-hoc

analysis would we complete?

Answer:

To determine which true mean enzyme levels pairwise differ, we need to implement several

Paired–Sample t-tests. The following Pairwise Comparisons table contains the test statistics

and corresponding p-values for the six paired-sample t-tests.

Pairwise Comparisons

Measure:MEASURE_1

(I) days (J) days

Mean Difference

(I-J) Std. Error Sig.a

95% Confidence Interval for

Differencea

Lower Bound Upper Bound

0 1 40.067* .859 .000 38.224 41.909

2 60.000* 1.121 .000 57.595 62.405

7 60.467* 1.287 .000 57.707 63.227

1 0 -40.067* .859 .000 -41.909 -38.224

2 19.933* 1.016 .000 17.753 22.113

7 20.400* 1.230 .000 17.762 23.038

2 0 -60.000* 1.121 .000 -62.405 -57.595

1 -19.933* 1.016 .000 -22.113 -17.753

7 .467 1.112 .681 -1.919 2.852

7 0 -60.467* 1.287 .000 -63.227 -57.707

1 -20.400* 1.230 .000 -23.038 -17.762

2 -.467 1.112 .681 -2.852 1.919

Based on estimated marginal means

*. The mean difference is significant at the .05 level.

a. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).

Conclusion: At the 5% level of significance there is evidence to conclude that the true mean

enzyme levels measured on day 0 and day 1 (p-value < 0.001), day 0 and day 2 (p-value <

0.001), day 0 and day 7(p-value < 0.001), day 1 and day 2 (p-value < 0.001), and day 1 and

day 7 (p-value < 0.001) differ. At the 5% level of significance, there is no evidence to conclude

that the true mean enzyme levels measured on day 2 and day 7 (p-value =0.681) differ.

Page 73: ModuleII Anova Outline

Module #2: Analysis of Variance

73

NOTE: When the populations are spherical, we can use a confidence interval plot to visualize

how the true means might differ. If the populations are not spherical, then a confidence interval

plot does not necessarily represent how the true means may differ.

Because we cannot to reject that the four populations from which our enzyme levels were

sampled are spherical, we will include below a 95% confidence interval plot illustrating our

estimated 95% confidence intervals for µday0 , µday1, µday2, and µday7.

Based on the 95% confidence interval plot above for the mean enzyme levels, we can see that

the 95% confidence intervals for the mean enzyme levels for day 2 and day 7 overlap.

Therefore, it’s reasonable to conclude that there is no difference between mean enzyme levels

for day 2 and day 7. Because there is the 95% confidence intervals for the true mean enzyme

levels for day 0 and day 1 do not overlap, we conclude that these two means differ. Further,

because both the 95% confidence intervals for the true mean enzyme levels for day 0 and day 1

do not overlap neither of the 95% confidence intervals for the true mean enzyme levels for day 2

and day 7, we conclude that both the true mean enzyme levels for day 0 and day 1 differ from

the true mean enzyme levels for both day 2 and day 7.

Page 74: ModuleII Anova Outline

Module #2: Analysis of Variance

74

The upshot of the above analysis is there is a statistically significant reduction in the true mean

enzyme levels from day 0 to day 1 and day 1 to day 2 and there is no statistically significant

change in the true mean enzyme levels from day 2 to day 7.

■ For Discussion: In the above post open-heart surgery enzyme level example, one of the

assumptions we need to verify was that each observation was drawn from a normally distributed

population. Is this assumption reasonable?

Answer:

Because there is only one observation from each patient/time combination, we cannot directly

test whether the observations from the population associated with each patient/time

combination are normally distributed. The best we can test is whether the associated marginal

distributions are normally distributed.

To test whether the enzyme-level measurements taken on each of Day 0, Day 1, Day 2, and

Day 7 are normally distributed, we refer to the p-values (based on the Shapiro-Wilk test for

normality) in the following Tests of Normality table. At the 5% level of significance, there is no

evidence to reject the assumptions that the enzyme-level measurements taken respectively on

Day 0 (p-value=0.537), Day 1 (p-value=0.566), and Day 7 (p-value=0.339) are normally

distributed. At the same level of significance, there is evidence to conclude that the enzyme-

level measurements taken on Day 2 are not normally distributed (p-value=0.033).

Tests of Normality

Kolmogorov-Smirnov

a Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Day0 .109 15 .200* .951 15 .537

Day1 .117 15 .200* .953 15 .566

Day2 .203 15 .096 .869 15 .033

Day7 .119 15 .200* .936 15 .339

a. Lilliefors Significance Correction

*. This is a lower bound of the true significance.

To test whether the enzyme-level measurements taken for a specific patient are normally distributed, we refer to the p-values (based on the Shapiro-Wilk test for normality) in the following Tests of Normality table. At the 5% level of significance, there is no evidence to reject

Page 75: ModuleII Anova Outline

Module #2: Analysis of Variance

75

the assumptions that the enzyme-level measurements for each of the fifteen patients respectively are normally distributed (p-value=0.240 for each patient).

Tests of Normality

Patient Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Day

d

i

m

e

n

s

i

o

n

1

1.00 .314 4 . .854 4 .240

2.00 .314 4 . .854 4 .240

3.00 .314 4 . .854 4 .240

4.00 .314 4 . .854 4 .240

5.00 .314 4 . .854 4 .240

6.00 .314 4 . .854 4 .240

7.00 .314 4 . .854 4 .240

8.00 .314 4 . .854 4 .240

9.00 .314 4 . .854 4 .240

10.00 .314 4 . .854 4 .240

11.00 .314 4 . .854 4 .240

12.00 .314 4 . .854 4 .240

13.00 .314 4 . .854 4 .240

14.00 .314 4 . .854 4 .240

15.00 .314 4 . .854 4 .240

a. Lilliefors Significance Correction

Because there was evidence that one of the marginal distributions was not normally distributed,

technically the “normality assumption” underlying our initial repeated-measures ANOVA test has

been violated.

For Discussion: In the above post open-heart surgery enzyme level example above, one of

the assumptions we needed to verify was that the patients and the times for the enzyme-level

measurement are additive. Are the patient and times for the enzyme-level measurement really

additive factors?

Answer:

Page 76: ModuleII Anova Outline

Module #2: Analysis of Variance

76

Referring to the below ANOVA with Tukey’s Test for Nonadditivity table, since the p-

value=0.345>0.05= we do not have evidence to reject the assumption that the patient and time for the enzyme-level measurement factors are additive.

ANOVA with Tukey's Test for Nonadditivity

Sum of Squares df Mean Square F Sig

Between People 4221.100 14 301.507

Within People Between Items 36282.267 3 12094.089 1301.662 .000

Residual Nonadditivity 8.482a 1 8.482 .911 .345

Balance 381.751 41 9.311

Total 390.233 42 9.291

Total 36672.500 45 814.944

Total 40893.600 59 693.112

Grand Mean = 76.2000

a. Tukey's estimate of power to which observations must be raised to achieve additivity = 1.139.

Note in the above example, the subjects are not grouped, ie. there is only one group. There is one repeated measures factor, ie the Time, with four levels (Day 0, 1, 2, and 7). Now Your Turn: Starch et al. (A-17) wanted to show the effectiveness of a central four-

quadrant sleeve and screw in anterior cruciate ligament reconstruction. The researchers performed a series of reconstructions on eight randomly selected cadaveric knees. The loads (in newtons) required to achieve different graft laxities (mm) for seven specimens (data not available for one specimen) using five different load weights were collected. The Graft laxities of Loads A through E were consecutively measured. Graft laxity is the separation (in mm) of the femur and the tibia at the points of graft fixation. 1. Is there sufficient evidence to conclude that different loads are required to produce

different levels of graft laxity? Refer to Exercise 8.4.2 (pg 352) in the textbook for the data

for this question. Work at the α=0.05 level of significance. You may assume that all the

assumptions required to implement the analysis hold.

2. One of the assumptions we need to verify was that each observation was drawn from a

normally distributed population. Is this assumption reasonable?

3. One of the assumptions we needed to verify was that the five graft laxity populations are

spherical. Are the five graft laxity populations spherical?

4. One of the assumptions we needed to verify was that the knee and load factors are

additive. Are the knee and load factors really additive factors?

Page 77: ModuleII Anova Outline

Module #2: Analysis of Variance

77

Random Effects Factor Suppose the levels of a factor have been selected at random from a population of levels. Then the factor is referred to as a random effects factor. The conclusions of the analysis will be directed at the population of levels, not just the levels selected for the experiment. The model for one fixed-effects factor and one random-effects factor is

yijk i j ij ijk #

where and, for i 1, . . . ,a,i are fixed unknown constants and ijk is a random, normally

distributed variable with mean 0 and variance 2 ; for jnj ,,...,1 is normally distributed with

mean 0 and variance B2 ; and for i 1, . . . ,a and for j 1, . . . ,n , bij is normally distributed

with mean 0 and variance AB2 . Note

i1

a

i 0. #

The assumptions to implement a random-effects Two-Way ANOVA are the same as those for the fixed-effects Two-Way ANOVA with the additional assumption that the levels for each random-effects factor were randomly selected. You also need at least two observations for every block-treatment combination. For Practice: In a study of the length of time spent on individual home visits by public health nurses, data were reported on length of home visit, in minutes, by a sample of 80 nurses (five nurses were randomly selected from each age/type of patient combination). The ages of the nurses were subdivided into four categories: 20-29, 30-39, 40-49, and 50+. Of all the different types of patients, the four types that were randomly selected were cardiac, cancer, c.v.a., and tuberculosis. The supervisor wants to know if one’s age causes a different length of time to be spent on individual home visits for an arbitrary patient type. Assuming the assumptions required

to analyze the data found in Table 8.5.5 in BFAHS, P. 360 hold, with 0.05, analyze the above scenario. Solution:

Note: By now, you should be able to verify the assumptions for this hypothesis test. Consequently their verification is not shown. Research Question: Does one’s age cause a different length of time to be spent on individual home visits for an arbitrary patient type? Let Factor A be the Type of Patient. Let Level 1 of Factor A be cardiac. Let Level 2 of Factor A be cancer. Let Level 3 of Factor A be c.v.a.. Let Level 4 of Factor A be tuberculosis. Let Factor B be the age group of a nurse. Let Level 1 of Factor B be 20-29 years of age. Let Level 2 of Factor B be 30-39 years of age. Let Level 3 of Factor B be 40-49 years of age.

Page 78: ModuleII Anova Outline

Module #2: Analysis of Variance

78

Let Level 4 of Factor B be 50 years of age or older. Let Population 1 be the set of nurses who are between 20 and 29 years of age and attend to cardiac patients. Let Population 2 be the set of nurses who are between 30 and 39 years of age and attend to cardiac patients. Let Population 3 be the set of nurses who are between 40 and 49 years of age and attend to cardiac patients. Let Population 4 be the set of nurses who are between 50 years of age or older and attend to cardiac patients. Let Population 5 be the set of nurses who are between 20 and 29 years of age and attend to cancer patients. Let Population 6 be the set of nurses who are between 30 and 39 years of age and attend to cancer patients. Let Population 7 be the set of nurses who are between 40 and 49 years of age and attend to cancer patients. Let Population 8 be the set of nurses who are between 50 years of age or older and attend to cancer patients. Let Population 9 be the set of nurses who are between 20 and 29 years of age and attend to c.v.a. patients. Let Population 10 be the set of nurses who are between 30 and 39 years of age and attend to c.v.a. patients. Let Population 11 be the set of nurses who are between 40 and 49 years of age and attend to c.v.a. patients. Let Population 12 be the set of nurses who are between 50 years of age or older and attend to c.v.a. patients. Let Population 13 be the set of nurses who are between 20 and 29 years of age and attend to tuberculosis patients. Let Population 14 be the set of nurses who are between 30 and 39 years of age and attend to tuberculosis patients. Let Population 15 be the set of nurses who are between 40 and 49 years of age and attend to tuberculosis patients. Let Population 16 be the set of nurses who are between 50 years of age or older and attend to tuberculosis patients. Hypothesis Test to be used: Because each of the patient types was randomly selected from all available patient types, we are going to use a one fixed-effects factor and one random-effects factor Two-Way ANOVA. Hypotheses to be tested: H0,Age: the true mean lengths of home visit times for each of the age categories are all equal. HA,Age: at least two of the true mean lengths of home visit times for each of the age categories differ. H0,Patient Type: the true mean lengths of home visit times for each patient type are all equal. HA,Patient Type: at least two of the true mean lengths of home visit times for each patient type differ. H0,Age x Patient Type: there is no interaction effect between one’s age category and the patient type attended on the true mean lengths of home visit times. HA,Age x Patient Type: there is an interaction effect between one’s age category and the patient type attended on the true mean lengths of home visit times..

Page 79: ModuleII Anova Outline

Module #2: Analysis of Variance

79

Hypothesis Test to be used: Two-Way ANOVA with one fixed and one random effects factor

Assumptions required to implement the hypothesis test: 1) The populations from which each of the random samples was taken must be normal. 2) The populations must have the same variances. 3) The samples must be independent of one another. 4) The groups must be equal in sample size. We are told to assume all the assumptions hold.

The Significance Level: 0.05 The Test Statistic and corresponding p-value: The following table contains the values of the test-statistics and the corresponding p-values that are required to test each of our three sets of hypotheses.

Tests of Between-Subjects Effects

Dependent Variable:Time

Source Type III Sum of

Squares df Mean Square F Sig.

Intercept Hypothesis 82818.450 1 82818.450 206.865 .001

Error 1201.050 3 400.350

Age Hypothesis 1201.050 3 400.350 5.922 .016

Error 608.450 9 67.606b

PatientType Hypothesis 2992.450 3 997.483 14.754 .001

Error 608.450 9 67.606b

Age * PatientType Hypothesis 608.450 9 67.606 4.605 .000

Error 939.600 64 14.681c

a. MS(PatientType)

b. MS(Age * PatientType)

c. MS(Error)

The Decision Rule:

Since, for the interaction term, ,001.005.0 valuep we reject H0,Age x Patient Type.

Since, for the main effect associated with the age category, ,016.005.0 valuep we

reject H0,Age.

Since, for the main effect associated with the type of patient, ,001.005.0 valuep we

reject H0,Patient Type.

Conclusion: If all the assumptions required to implement the analysis are valid, at the 05.0

level of significance, we have evidence to conclude that at least two of the true mean lengths of

Page 80: ModuleII Anova Outline

Module #2: Analysis of Variance

80

home visit times based on the four different age categories differ (p-value=0.016) and we have evidence to conclude that at least two of the true mean lengths of home visit times based on the type of patient selected differ (p-value=0.001). At the same level of significance, we also have evidence to conclude that there is an interaction effect between a nurse’s age category and the type of patient selected on the true mean lengths of home visit times (p-value < 0.001). NOTE: You would now complete a post-hoc analysis to explore how the mean lengths of home visit times differ based on the age category and the type of patient and how the these factors interact to influence the mean lengths of home visit times. For the sake of brevity, we do not present this post-hoc analysis.

Now Your Turn: A health district owns 36 identical make/model ambulances. The district CEO

is interested in comparing the effects of three brands of tires (A, B and C) on mileage (mpg).

The district installs each brand on 12 of its ambulances, i.e. twelve of which have tire brand A

installed, twelve of which have tire brand B installed, and the remaining twelve have tire brand C

installed. The CEO realizes that, in addition to the tire brand, the driver will also affect the

mileage. Consequently the CEO randomly selects 4 drivers from its collection of drivers and

randomly assigns the drivers to the ambulances in such a manner that each driver drives three

ambulances with each tire brand. The resulting mileages are summarized below:

Driver Tire

Brand Mileage Driver Tire

Brand Mileage

1 A 39.6 3 A 33.9

1 A 38.6 3 A 43.2

1 A 41.9 3 A 41.3

1 B 18.1 3 B 17.8

1 B 20.4 3 B 21.3

1 B 19.0 3 B 22.3

1 C 31.1 3 C 31.3

1 C 29.8 3 C 28.7

1 C 26.6 3 C 29.7

2 A 38.1 4 A 36.9

2 A 35.4 4 A 30.3

2 A 38.8 4 A 35.0

2 B 18.2 4 B 17.8

2 B 14.0 4 B 21.2

2 B 15.6 4 B 24.3

2 C 30.2 4 C 27.4

2 C 27.9 4 C 26.6

2 C 27.2 4 C 21.0

Page 81: ModuleII Anova Outline

Module #2: Analysis of Variance

81

The CEO wishes to generalize its findings regarding the impact of the tire brand on mileage to

all ambulance drivers within the district. At the 5% level of significance, analyze the CEO’s

data. You may assume that all the assumptions for your analysis hold.

Page 82: ModuleII Anova Outline

Module #2: Analysis of Variance

82

Learning Activities

Discussion Questions

Critical Thinking Questions

Assignments/Activities

Page 83: ModuleII Anova Outline

Module #2: Analysis of Variance

83

References Cite any references used in the learning material.