Module #2: Analysis of Variance
1
Table of Contents Overview .................................................................................................................................... 2
Learning Outcomes .................................................................................................................... 3
Required Resources .................................................................................................................. 4
Key Terms and Concepts ........................................................................................................... 5
Learning Material ....................................................................................................................... 7
Learning Activities .....................................................................................................................82
References ...............................................................................................................................83
Module #2: Analysis of Variance
2
Overview
Quite often we wish to draw conclusions based on how a nominal independent variable
affects a continuous dependent variable. In this module we will learn how to determine this
effect using data collected from completely-randomized designs, randomized-block designs,
and repeated measures designs. In addition to analyzing the data collected using these
three designs, we will also learn which post-hoc analyses (both qualitative and quantitative)
can be used to further explore relationships within our data sets. We then use the results of
these analyses to address the initial research question.
Module #2: Analysis of Variance
3
Learning Outcomes
At the completion of this module, you will be able to
Distinguish between a completely randomized design, a randomized block design, and a
repeated measures design;
Determine graphically whether the means associated with several independent groups
differ;
Use data collected from several independent populations to investigate the average
effect that different factors have on a dependent variable;
Graphically and through inference determine whether two factors interact;
Distinguish between factors which are fixed-effects factors and those which are random-
effects factors;
Remove the effect of a factor from an analysis by treating the factor as a blocking factor;
Use data collected from several dependent populations to investigate the average effect
that different factors have on a dependent variable; and
Draw public health conclusions based on the discussed statistical inference topics.
Module #2: Analysis of Variance
4
Required Resources In this section, list all resources – readings, texts, web sites, videos, audio casts, etc.
You may also wish to include a “recommended resources” or “for further interest” section here,
but be sure to separate these from the required readings.
Module #2: Analysis of Variance
5
Key Terms and Concepts Analysis of Variance
ANOVA
One-way ANOVA
Factor
Treatment
Multiple Testing Problem
Between Treatments Variance
Within Treatment Variance
Mean Square Error
Completely Randomized Design
Confidence Interval Plots
Post-hoc Analysis
Scheffé Test
Tukey Test
Two-way ANOVA
Main Effect
Interaction Effect
Fixed-effects Factor
Profile of a Factor
Profile Analysis
Additive Factors
Interacting Factors
Syntax Analysis
Random-effects Factor
Randomized Block Design
Module #2: Analysis of Variance
6
Blocking Factors
Repeated Measures Design
Sphericity
Within-Subjects Main/Interaction Effects
Between-Subjects Main/Interaction Effects
Within-Between Subjects Interaction Effects
Module #2: Analysis of Variance
7
Learning Material
Analysis of Variance (ANOVA) The material we have reviewed thus far relied on the fact that we had samples from one or two different populations. Quite often we wish to draw conclusions based on how a nominal independent variable (called a factor) affects a continuous dependent variable. To study the effect of the factor on the continuous dependent variable, the factor is divided into several different categories called levels (or treatments). -- eg. Suppose we were interested in studying how aspirin, propranolol, captopril, and diltiazem affect systolic blood pressure. The factor would be the category “medication” and the levels associated with the factor are the four different drugs. For Discussion: What issues/problems might arise when we attempt to analyze the data collected to implement the study in the above example?
Module #2: Analysis of Variance
8
Answer: If we had only one or two levels, we could use a t-test to compare the means of the data grouped by the levels BUT if we have three or more levels, then a number of problems arise:
(1) We have trouble determining/interpreting the significance level; (2) As the number of levels increases, the number of t-tests we would have to implement
increases dramatically. For example, if we had only four levels, we might have to implement six two-sample t-tests. With each additional test we must complete, the probability of making a Type I error also increases (ie. Pr(Type I error)>(1-(1-α)n) where n is the number of tests to be implemented. This is referred to as the Multiple Testing Problem.
■
To avoid the above problems, we can use Analysis of Variance (ANOVA). We will begin our ANOVA discussion by looking at one-way ANOVA, where we investigate how one factor affects the dependent variable. (Two-way ANOVA would be used if two factors were believed to influence the dependent variable).
Module #2: Analysis of Variance
9
One-Way Analysis of Variance
F -Test For Independent Samples For Discussion: How can looking at the variances yield any information about the population means of the individual treatments of data? Answer: Note that any particular observation (data point) can be decomposed as follows: Observation = grand mean + (treatment mean - grand mean) + (observation - treatment mean)
The formal model is, for the
j 'th observation from the
i 'th treatment:
xi, j (i, ) (xi, j i,).
If we expect there to be no difference in the population treatment means i,., then we would
expect the treatment mean minus grand mean to be essentially zero. The red “formula” above
illustrates how two types of variance can explain the deviation of the observation from the grand mean. (1) The first type of variation is represented by “treatment mean minus grand mean”, which is related to the variance between the treatments of data. Some refer to this variance as the
“Between Treatment Variance
sB2 ”. Most statistical packages do not directly compute the
Between Treatment Variance. Packages usually compute a quantity called the “Between Treatment Sum of Squares (
SSB ) ” and present its degrees of freedom (k-1) where k is the
number of treatments. Note that the “Mean Square Between Treatment Sum of Squares
(
MSB MST )” is the Between Treatment Variance
sB2 and is calculated using
MST MSB SSB
k 1 sB
2 .
(2) The second type of variation is represented by “observation minus treatment mean”, which is related to the variance within the treatments of data. Some refer to this variance as the “Within
Treatment Variance
sW2
”. Most statistical packages do not directly compute the Within
Treatment Variance. Packages usually compute a quantity called the “Total Residual Sum of Squares” or the “Error Sum of Squares (
SSW )” and present its degrees of freedom (N-k) where
N is the total number of observations and k is the number of treatments. Note that “Mean
Square Error Sum of Squares (
MSW MSE)” is the Within Treatment Variance
sW2
and is
calculated using
MSE MSW SSW
N k sW
2 .
■
Module #2: Analysis of Variance
10
The question becomes, “How does one calculate
SSB and
SSW ?”
We could compute
SSB and
SSW as follows:
1) First compute the total for each of the
i samples using
Ti j1
ni
xij .
2) Now compute the total of all the observations from all the treatments, ie. compute the grand
total of the observations using
G i1
k
Ti .
3) Determine
ni (the number of observations in the
i 'th sample) and
N (the total number of
observations taken over all the treatments).
4) Compute the sum of the squares of all the observations using
i1
k
j1
ni
xij2 .
5) Compute
i1
k
Ti2
ni.
6) Then
SSB i1
k
Ti2
ni
G2
N
and
SSW i1
k
j1
ni
xij2
i1
k
Ti2
ni
7) Note the Total Sum of Squares (
SST ) is computed using
SST SSB SSW
The above calculations can be summarized in a table call an ANOVA table:
Module #2: Analysis of Variance
11
The One-Way ANOVA
F -test compares
MSB and
MSW . If the
MSB is much larger than the
MSW , then we should conclude that at least one of the population treatment means differs from
the other population treatment means. Sometimes the original data for each treatment is not available. The original data has been summarized, that is the sample size, the sample mean, and the sample standard deviation are provided. All is not lost! We can still compute
SSB and
SSW as follows:
SSB i1
k
ni xi xg rand 2
,
SSW i1
k
ni 1 si2,
and
xg rand i1
k
ni xi
N,
where
xi is the sample mean of the
i 'th treatment,
si2 is the sample variance of the
i 'th
treatment, and
ni is the number of observations in the
i 'th treatment.
Module #2: Analysis of Variance
12
When data is collected via an experiment in which the treatments are assigned randomly to the experimental units, we can analyze this data using ANOVA. This type of experimental design is referred to as the completely randomized experimental design. When our completely randomized experimental design assigns individuals to different levels/treatments for a single factor, we can analyze the corresponding collected data using a One-Way ANOVA F-Test. We will present our One-Way ANOVA
F -Tests using the same format that we learned in the previous module. Research Question: Population Declarations: The Hypotheses to be tested are:
H0 :
1 2 ...k .
H a : at least two of the population treatment means differ.
The underlying assumptions are:
1) the populations from which each of the random samples was taken must be normal; 2) the populations must have the same variances; 3) the samples must be independent of one another; and 4) the data must be collected via an experiment in which the treatments are assigned
randomly to the experimental units. The Significance level is:
The test statistic is:
),(W
B
MS
MSdF
where v=k-1 degrees of freedom in the numerator, d=N-k degrees of freedom in the denominator,
N is the total number of observations taken across all the treatments, and
k is the number of treatments. The
p -value is calculated using a software package.
The critical value
F (v,d) can be found in an appropriate
F -table.
Decision Rule is: If
F(,d)F(v,d), Reject
H0 . OTHERWISE do not reject
H0 .
or equivalently, if p-value
, Reject
H0 . OTHERWISE do not reject
H0 .
Conclusion:
Module #2: Analysis of Variance
13
For practice: BFAHS, p. 331, q. 8.2.4. Gold et al. (A-5) investigated the effectiveness on smoking cessation of a nicotine patch, bupropion SR, or both, when co-administered with cognitive-behavioural therapy. Consecutive consenting patients (N=164) assigned themselves to one of three treatments according to personal preference: nicotine patch (NTP, n=13), bupropion SR (B, n=92), and buproprion SR plus nicotine patch (BNTP, n=59). At their first smoking cessation class, patients estimated the number of packs of cigarettes they currently smoked per day and the number of years they smoked. The “pack years” is the average number of packs the subject smoked per day multiplied by the number of years the subject had smoked. Using the 10% level of significance, analyze the data collected for this problem. The data can be downloaded from the Student Companion Sites link that appears on the website: http://ca.wiley.com/WileyCDA/WileyTitle/productCd-EHEP000107.html. The example is Question 4 of Section 2 of Chapter 8. For the sake of this example, assume that all the assumptions required to fully analyze the data are true. Solution:
Research Question:
Is there a difference in the average number of pack years based on the smoking cessation technique? Population Declarations:
Let Population 1 be the people who use the nicotine patch (NTP) to assist in smoking cessation
and NTP be the mean number of pack years associated with this population. Let Population 2 be the people who use buproprion SR (B) to assist in smoking cessation and
B be the mean number of pack years associated with this population. Let Population 3 be the people who use buproprion SR plus nicotine patch (BNTP) to assist in
smoking cessation and BNTP be the mean number of pack years associated with this population. Hypothesis to be tested:
H0: NTP=B=BNTP, ie. the true mean number of pack years of smokers for the three smoking cessation groups are equal. HA: not H0, that is at least two of the true mean number of pack years of smokers differ. Hypothesis Test to be used: One-Way ANOVA F-Test Assumptions required to implement the test:
1. Randomness: We assume that the sample was randomly selected 2. Independence: By the design of the experiment, the data are sampled from independent
populations. At this point, we will assume that the data sampled from within the same population are independent.
3. Normality: Each of the three populations must be normally distributed. We are told to assume this is true.
4. Equality of Variances: The three populations must have the same variances. We are told to assume this is true.
The Significance Level: 10.0
Module #2: Analysis of Variance
14
The Test Statistic and corresponding p-value: From the ANOVA table below, the value of the test statistic is F(2,161)=5.878 and the associated p-value=0.003.
ANOVA
years
Sum of Squares df Mean Square F Sig.
Between Groups 14489.627 2 7244.814 5.878 .003
Within Groups 198442.245 161 1232.561
Total 212931.872 163
The Decision Rule: Since the p-value = 0.003 < 0.10 = , we reject H0. The Conclusion: At the 10% level of significance, with a p-value = 0.003, we have evidence to conclude that the true mean pack years for at least two of the populations differ. Because we rejected the null hypothesis, we would have to do a post-hoc analysis to determine how the means differed. We will learn how to implement this analysis in a few moments.
■ For Discussion: How would you actually test the normality assumption in the previous example? What hypotheses would you have to test? At what conclusions would you arrive after you implement the requisite hypothesis tests? Is the normality assumption actually true? Answer: To determine whether the data supports the normality assumption, we must perform a Test for Normality on each of the three populations individually. Because there are at least three observations sampled from each population, we can use the Shapiro-Wilk Test for Normality. To this end, we use the following table from SPSS.
Tests of Normality
Group
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Years nicotine patch .217 13 .094 .789 13 .005
bupropion SR .067 92 .200* .981 92 .204
nicotine patch and bupropion SR .086 59 .200* .978 59 .360
a. Lilliefors Significance Correction
Module #2: Analysis of Variance
15
Tests of Normality
Group
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Years nicotine patch .217 13 .094 .789 13 .005
bupropion SR .067 92 .200* .981 92 .204
nicotine patch and bupropion SR .086 59 .200* .978 59 .360
a. Lilliefors Significance Correction
*. This is a lower bound of the true significance.
The three sets of hypotheses to be tested are
- H0,NTP: The number of pack years in Population 1 is normally distributed. HA,NTP: The number of pack years in Population 1 is not normally distributed.
- H0,B: The number of pack years in Population 2 is normally distributed. HA,B: The number of pack years in Population 2 is not normally distributed.
- H0,BNTP: The number of pack years in Population 3 is normally distributed. HA,BNTP: The number of pack years in Population 3 is not normally distributed.
Referring to the Test of Normality Table above, regarding Population 1 (the NTP group), since p-value = 0.005 < 0.10 = α, we reject H0,NTP; regarding Population 2 (the B group), since p-value = 0.204 > 0.10 = α, we do not reject H0,B; and regarding Population 3 (the BNTP group), since p-value = 0.360 > 0.10 = α, we do not reject H0,BNTP. At α = 0.10 level of significance, there is evidence to conclude that the number of pack years in Population 1 (the NTP group) is not normally distributed (p-value=0.005). At the same level of significance, there is not enough evidence to reject the assumptions that the number of pack years in Populations 2 (the B group) and 3 (the BNTP group) are normally distributed (with p-values of 0.204 and 0.360 respectively). Because one of the populations is not normally distributed, the assumption that all three of the populations are normally distributed has been violated.
■ For Discussion: How would you actually test the “equality of variances” assumption in the previous example? What hypotheses would you have to test? At what conclusions would you arrive after you implement the requisite hypothesis tests? Is the “equality of variances” assumption actually true? Answer:
To determine whether the data supports the “equality of variances” assumption, because there at least three observations were sampled from each population, we can perform a Levene’s
Module #2: Analysis of Variance
16
Test for Equality of Variances to see if the variances in the number of pack years in the three populations are not all equal. To this end, we use the following table from SPSS.
Test of Homogeneity of Variances
Years
Levene Statistic df1 df2 Sig.
.690 2 161 .503
The hypotheses to be tested are
H0:
HA: not H0, that is at least two of the three population variances differ.
From the Test of Homogeneity of Variances Table above, since the p-value=0.503 > 0.10 = , we do not reject H0. Hence, at the 10% level of significance, with a p-value=0.503, we do not have enough evidence to reject the assumption that all three populations have the same variance.
■ For Discussion: When an
F -Test for ANOVA rejects the null hypothesis (as in the previous
example), how does one determine which pairs of means significantly differ? Two solutions to this question are presented next.
Module #2: Analysis of Variance
17
A graphical method Confidence intervals can be used to visualize which pairs of means differ significantly. When forming the confidence intervals, be sure to use the confidence level associated with the significance level from the hypothesis test. Consider the following graph which displays the 90% confidence intervals computed based on the sample data for each of the three different smoking cessation techniques in the previous example.
For Discussion: How do we use the above graph to help determine the relationship between the different treatment means?... the different variances?
Module #2: Analysis of Variance
18
Answer:
To determine whether it is reasonable that the population means are equal, we look at the corresponding confidence interval plot to identify whether the confidence intervals overlap. If all the plotted intervals overlap, you would not reject the assumption that all the true means are equal. If two of the intervals do not overlap, you would have evidence to conclude that those two true means differ. Referring to the previous 90% confidence interval plot for the average number of pack years associated with the three different smoking cessation techniques, the confidence interval for the NTP group (Population 1) does not overlap the confidence intervals for the B group (Population 2) and the BNTP group (Population 3). Therefore it would be reasonable to conclude that the true mean number of pack years for the NTP group differs from the true mean number of pack years for both the B and BNTP groups. Because the confidence intervals for the B and BNTP groups overlap, we do not have evidence to reject that the true mean numbers of pack years for the B and BNTP groups are equal. To determine whether it is reasonable that the population variances are equal, we look at the widths of the confidence intervals in the confidence interval plot. If the widths are approximately equal, then we do not have any evidence to reject that the true variances are all equal. For two of the intervals, if the larger width divided by the smaller width is greater than two then we have evidence to conclude that those two variances differ (provided each sample size is reasonable). Referring to the previous 90% confidence interval plot based on the sample data for each of the three different smoking cessation techniques, the confidence interval width for the NTP group (Population 1) is not two times larger than the widths of the B-group (Population 2) and the BNTP-group (Population 3) confidence intervals and the B-group confidence interval width is not two times larger than the width of the BNTP-group confidence interval, we would not have evidence to reject the assumption that the variances of the three groups equal.
■ The method we just discussed for determining which means differ is rather “nebulous”. We will discuss two quantitative methods for determining which of the means presented in a One-Way ANOVA
F -Test, if any, differ significantly. The first method we are going to discuss is called the “Scheffé Test”. The second method we will discuss is called the “Tukey Test”. The Scheffé Test and the Tukey Test are examples of “post hoc” analyses (i.e. analyses for which one did not ahead of time plan). Both tests can be used whenever the assumptions for the
F -Test for One-Way ANOVA are true.
The Scheffé Test To implement the Scheffé Test, we must compare the means, two at a time. For the smoking
cessation example, for example, we would have to compare the sample means NTPx with Bx ;
NTPx with BNTPx and Bx with BNTPx . The value of the test statistic for the Scheffé Test based on
Treatment
i and Treatment
j (for
i j ) is:
Module #2: Analysis of Variance
19
ji nnw
jiji
Ss
xxkNkF
112
2, )(
),1(
where
x i and
x j are the sample means of Treatment
i and Treatment
j respectively, in and
jn are the size of the samples for Treatment
i and Treatment
j respectively, and
sw2 is the
“Within-the-treatment variance
MSW ” that we computed in the One-Way ANOVA
F -Test. The
critical value for the Scheffé Test is
FS(k 1,N k) (k 1)F (k 1,N k)
where
N is the total number of observations across all the samples,
k is the number of treatments, and
is the significance level used in the One-Way ANOVA
F -Test. Then there is a significant difference (at the
level of significance) between the means of Treatment
i and Treatment
j (for
i j ) if
FSi, j (k 1,N k)F
S (k 1,N k).
For practice: Use the Scheffé Test to determine which (if any) of the pairs of means in the
smoking cessation example differ significantly at the 10.0 level of significance.
Solution:
The hypotheses to be tested are:
- H0,NTPxB: NTPB
HA,NTPxB: NTP≠B
- H0,NTPxBNTP: NTPBNTP
HA,NTPxBNTP: NTP≠BNTP
- H0,BxBNTP: BBNTP
HA,BxBNTP: B≠BNTP
Multiple Comparisons
Dependent Variable:years
(I) group (J) group Mean Difference (I-J) Std. Error Sig. 90% Confidence Interval
Lower Bound Upper Bound
Scheffe NTP B -33.31939799 10.40239133 .007 -55.80316221 -10.83563378
BNTP -36.18122555 10.75654241 .004 -59.43045314 -12.93199797
B NTP 33.31939799 10.40239133 .007 10.83563378 55.80316221
BNTP -2.861827561 5.855617255 .888 -15.51817874 9.79452361
Module #2: Analysis of Variance
20
BNTP NTP 36.18122555 10.75654241 .004 12.93199797 59.43045314
B 2.861827561 5.855617255 .888 -9.79452361 15.51817874
*. The mean difference is significant at the .1 level.
Referring to the Multiple Comparisons Table above:
regarding the NTP and B groups, because the p-value = 0.007 < 0.10 = , we reject H0,NTPxB;
regarding the NTP and BNTP groups, because the p-value = 0.004 < 0.10 = , we reject H0,NTPxBNTP; and
regarding the B and BNTP groups, because the p-value = 0.888 > 0.10 = , we do not reject H0,BxBNTP. Consequently, at the 0.10 level of significance, with a p-value=0.007, we have evidence to conclude that the true mean pack years for the buproprion and nicotine patch groups differ and, with a p-value=0.004, we have evidence to conclude that the true mean pack years for the nicotine patch and the nicotine patch/buproprion combination groups differ. At the same level of significance, with a p-value=0.888, we cannot reject the assumption that the true mean pack years for the buproprion and the nicotine patch/buproprion combination groups are equal.
■ NOTE: There are situations that arise in which the
F -Test ANOVA indicates that there is a significant difference between at least two of the means BUT the Scheffé Test fails to identify any significant differences in the pairs of means.
Module #2: Analysis of Variance
21
The Tukey Test The Tukey Test can also be used after the One-Way ANOVA
F -Test has been completed to determine any pairwise differences between the means of the groups. The value for the Tukey test statistic for Population
i and
j is given by
q xi x j
sW2 /nh
,
where
nh k /(1/n11/n2 ...1/nk ). When the absolute value of the Tukey test statistic is
greater than the Tukey critical value (from an apriori standard table of values), there is a significant difference between the means corresponding to Population
i and Population
j.
For practice: Use the Tukey Test to determine which (if any) of the pairs of means in the
smoking cessation example differ significantly at the 10.0 level of significance.
Solution:
The hypotheses to be tested are:
- H0,NTPxB: NTPB
HA,NTPxB: NTP≠B
- H0,NTPxBNTP: NTPBNTP
HA,NTPxBNTP: NTP≠BNTP
- H0,BxBNTP: BBNTP
HA,BxBNTP: B≠BNTP
Multiple Comparisons
Dependent Variable:years
(I) group (J) group Mean Difference (I-J) Std. Error Sig. 90% Confidence Interval
Lower Bound Upper Bound
Tukey HSD NTP B -3.331939799E1 1.040239133E1 .005 -54.82150371 -11.81729228
BNTP -3.618122555E1 1.075654241E1 .003 -58.41537392 -13.94707719
B NTP 3.331939799E1 1.040239133E1 .005 11.81729228 54.82150371
BNTP -2.861827561 5.855617255 .877 -14.96559268 9.24193755
BNTP NTP 3.618122555E1 1.075654241E1 .003 13.94707719 58.41537392
B 2.861827561 5.855617255 .877 -9.24193755 14.96559268
*. The mean difference is significant at the .1 level.
Referring to the Multiple Comparisons table above,
Module #2: Analysis of Variance
22
regarding the NTP and B groups, because the p-value = 0.005 < 0.10 = , we reject H0,NTPxB;
regarding the NTP and BNTP groups, because the p-value = 0.003 < 0.10 = , we reject H0,NTPxBNTP; and
regarding the B and BNTP groups, because the p-value = 0.877 > 0.10 = , we do not reject H0,BxBNTP. Consequently, at the 0.10 level of significance, with a p-value=0.005, we have evidence to conclude that the true mean pack years for the buproprion and nicotine patch groups differ and, with a p-value=0.003, we have evidence to conclude that the true mean pack years for the nicotine patch and the nicotine patch/buproprion combination groups differ. At the same level of significance, with a p-value=0.877, we cannot reject the assumption that the true mean pack years for the buproprion and the nicotine patch/buproprion combination groups are equal.
■ NOTE: In the situation where we only are making pairwise comparisons, the Tukey Test is preferred to the Scheffé Test. NOTE: There are other tests that possibly could be used.
Now Your Turn: BFAHS, p. 329, q. 8.2.2. Patients suffering from rheumatic diseases or
osteoporosis often suffer critical losses in bone mineral density (BMD). Alendronate is one medication prescribed to build or prevent further loss of BMD. Holcomb and Rothenberg (A-3) looked at 96 women taking alendronate to determine if a difference existed in the mean % change in BMD among five different primary diagnosis classifications. Group 1 patients were diagnosed with rheumatoid arthritis (RA). Group 2 patients were a mixed collection of patients with diseases including lupus, Wegener granulomatosis and polyarteritis, and other vascular diseases (LUPUS). Group 3 patients had polymyalgia rheumatica or temporal arthritis (PMRTA). Group 4 patients had osteoarthritis (OA) and group 5 patients having osteoporosis (O) with no other rheumatic diseases identified in the medical record. Completely analyze the above data at the 10% level of significance. The data can be found on the textbook website.
Module #2: Analysis of Variance
23
Two-Way ANOVA with Interaction Suppose we are interested in studying the effects that two independent variables (or two factors) have on a single dependent variable. When we use a completely randomized experimental design to assign individuals to the different levels/treatments of the two factors, we can analyze the corresponding collected data using a Two-Way ANOVA F-Test. A Two-Way ANOVA allows a researcher to test whether each of the factors and their interaction have a statistically significant effect on the dependent variable.
All fixed effects factors Suppose, when designing an experiment, the levels of each factor are identified and fixed and the conclusions of any analysis is in relationship to these levels. Then the factors are fixed effects factors and we need to perform a “Fixed-effects Factors Two-Way ANOVA”. The model for two fixed-effects factors is
ijkijjiijky )(
where
and, for
i 1,...,a, j 1,...,n , ,, ji and ij)( are fixed unknown constants and
ijk is
a random, normally distributed variable with mean 0 and variance
2. Note
i1
a
i j1
n
j i1
a
()ij j1
n
()ij 0.
If we were to implement the test by hand, we would need to calculate: 1)
x, k 1
nai1
n
j1
a
xi, j,k
x j, k 1
ni1
n
xi, j ,k
x, 1
nabi1
n
j1
a
k1
b
xi, j,k
2) The sum of the squares for Factor A:
SSA nbj1
a
x j, x, 2
x j, 1
nbi1
n
k1
b
xi, j,k
Module #2: Analysis of Variance
24
3) The sum of the squares for Factor B:
SSB nak1
b
x, k x, 2
4) The sum of the squares for the interaction:
2,,,,
11
kjkj
b
k
a
j
BA xxxxnSS
5) The sum of the squares for the within-group error term:
SSW i1
n
j1
a
k1
b
xi, j, k x j, k 2
6)
a is the number of levels of Factor A
7)
b is the number of levels of Factor B 8)
n is the number of subjects in each group
9)
MSA SSA
a1
10)
MSB SSB
b1
11) 11
ba
SSMS BA
BA
12)
MSW SSW
ab(n1)
13)
FA MSA
MSW with
a1 degrees of freedom in the numerator and
ab(n1) degrees of
freedom in the denominator
14)
FB MSB
MSW with
b1 degrees of freedom in the numerator and
ab(n1) degrees of
freedom in the denominator
15) W
BABA
MS
MSF
with
a1 b1 degrees of freedom in the numerator and
ab(n1)
degrees of freedom in the denominator
Module #2: Analysis of Variance
25
In order to use ANOVA, there are three sets of hypotheses to be tested. We will present our Two-Way ANOVA
F -Tests using the same format that we learned in the previous section using the following template. Research Question: Population Declarations: Hypotheses to be tested:
AH ,0 : there is no difference between the true means of the dependent variable based
on the different levels of Factor A.
AaH , : not AH ,0 , that is, there is a difference between at least two of the true means of
the dependent variable based on the different levels of Factor A.
BH ,0 : there is no difference between the true means of the dependent variable based
on the different levels of Factor B.
BaH , : there is a difference between at least two of the true means of the dependent
variable based on the different levels of Factor B.
BAH ,0 : there is no interaction effect between Factor A and Factor B on the true means
of the dependent variable.
BAaH , : there is an interaction effect between Factor A and Factor B on the true means
of the dependent variable. The assumptions required to implement the test:
1) The populations from which each of the random samples was taken must be normal.
2) The populations must have the same variances. 3) The samples must be independent of one another. 4) The groups must be equal in sample size.
The Significance level:
The test statistics:
1)
FA MSA
MSW with
a1 degrees of freedom in the numerator and
ab(n1) degrees of freedom
in the denominator
Module #2: Analysis of Variance
26
2)
FB MSB
MSW with
b1 degrees of freedom in the numerator and
ab(n1) degrees of freedom
in the denominator
3) W
BABA
MS
MSF
with
a1 b1 degrees of freedom in the numerator and
ab(n1)
degrees of freedom in the denominator, where
n is the number of subjects in each group,
a is the number of levels of Factor A, and
b is the number of levels of Factor B. Calculate p-value using technology.
There will be a critical value
F (v,d) for each of the above three test statistics which can be
found in an appropriate
F -table. The Decision Rule:
With respect to Factor A: If
FA F(a1, ab(n1)) , reject AH ,0 , otherwise do not reject .,0 AH
With respect to Factor B: If
FB F(b1, ab(n1)) , reject BH ,0 , otherwise do not reject .,0 BH
With respect to the interaction of Factors A and B: If ))1(,11( nabbaFF BA , reject
BAH ,0 , otherwise do not reject .,0 BAH
Equivalently:
With respect to Factor A: If p-value=
Pr[F FA], reject AH ,0 , otherwise do not reject .,0 AH
With respect to Factor B: If p-value=
Pr[F FB], reject BH ,0 , otherwise do not reject .,0 BH
With respect to the interaction of Factors A and B: If p-value=
Pr[F FAB ], reject BAH ,0 ,
otherwise do not reject .,0 BAH
Conclusion: We write our conclusion here.
Module #2: Analysis of Variance
27
For Practice: A medical researcher wishes to test the effects of two different diets and two different exercise programs on the glucose level in a person's blood. The glucose is measured in milligrams per decilitre (mg/dl). Three subjects are randomly assigned to each group and the glucose levels are summarized in the table below. Analyze the researcher's data at the level of significance.
Solution: Research Question: Does one's diet and exercise program affect the glucose level in a person's blood?
Population Declarations:
Let Factor A be the Diet of the individuals. Let Level 1 of Factor A be Diet A and Level 2 of Factor A be Diet B. Let Factor B be the Exercise Program followed by the individuals. Let Level 1 of Factor B be Exercise Program 1 and Level 2 of Factor B be Exercise Program 2. Let Population 1 be the individuals who have Diet A and are on Exercise Program 1. Let Population 2 be the individuals who have Diet A and are on Exercise Program 2. Let Population 3 be the individuals who have Diet B and are on Exercise Program 1. Let Population 4 be the individuals who have Diet B and are on Exercise Program 2.
Hypothesis to be tested:
DietH ,0 : there is no difference between the true mean glucose blood levels associated with
the two different diets.
DietaH , : there is a difference between the true mean glucose blood levels associated with
the two different diets.
ExerciseH ,0 : there is no difference between the true mean glucose blood levels associated
with the two different exercise programs.
ExerciseaH , : there is a difference between the true mean glucose blood levels associated with
the two different exercise programs.
ExerciseDietH ,0 : there is no interaction effect between one's diet and exercise program on the
true mean glucose levels in the blood.
ExerciseDietaH , : there is an interaction effect between one's diet and exercise program on the
true mean glucose levels in the blood.
0.05
Diet A Diet B
Exercise 1 62 58
64 62
66 53
Exercise 2 65 83
68 85
72 91
Module #2: Analysis of Variance
28
Hypothesis Test to be used: Two-Way ANOVA
Assumptions required to implement the hypothesis test:
1) Based on the Tests of Normality table below,
Tests of Normality
Kolmogorov-Smirnov
a Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Exercise1_dietA .175 3 . 1.000 3 1.000
Exercise1_dietB .196 3 . .996 3 .878
Exercise2_dietA .204 3 . .993 3 .843
Exercise2_dietB .292 3 . .923 3 .463
a. Lilliefors Significance Correction
regarding the Exercise1/diet A group (Population 1), since the p-value>0.999 > 0.05= α, we do not have evidence to reject that the glucose levels of individuals in population 1 are normally distributed; regarding the Exercise1/diet B group (Population 3), since the p-value= 0.878 > 0.05= α, we do not have evidence to reject that the glucose levels of individuals in population 3 are normally distributed; regarding the Exercise2/diet A group (Population 2), since the p-value= 0.843 > 0.05 = α, we do not have evidence to reject that the glucose levels of individuals in population 2 are normally distributed; and regarding the Exercise2/diet B group (Population 4), since the p-value= 0.463> 0.05 = α, we do not have evidence to reject that the glucose levels of individuals in population 4 are normally distributed. Therefore, at the α = 0.05 level of significance, there is no evidence to reject the assumptions that the glucose levels in each of populations 1 through 4 are normally distributed (with p-values of approximately 1.0, 0.843, 0.878, and 0.463 respectively). Hence we can continue with our analysis.
2) The populations must have the same variances.
Based on the following Test of Homogeneity of Variances table, the test statistic is L (3; 8) =0.633 with an associated p-value= 0.614.
Levene's Test of Homogeneity of Variancesa
Dependent Variable:glucose
F df1 df2 Sig.
.633 3 8 .614
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
a. Design: Intercept + exercise + diet + exercise * diet
Because the p-value= 0.614 > 0.05 = α, there is no evidence to reject the assumption that the four populations all have the same variance. Therefore, we can still carry out the two-way ANOVA F-test.
Module #2: Analysis of Variance
29
3) The samples must be independent of one another. 4) The groups must be equal in sample size.
The Significance Level:
0.05 The Test Statistic and corresponding p-value:
From the Tests of Between-Subjects Effects Table below, the value of the test statistic FDiet(1,8)=7.562 and its associated p-value=0.025, the value of the test statistic FExercise(1,8)=60.500 and its associated p-value<0.001, and the value of the test statistic FDietxExercise(1,8)=32.895 and its associated p-value<0.001.
Tests of Between-Subjects Effects
Dependent Variable:Glucose
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 1362.917a 3 454.306 33.652 .000
Intercept 57270.083 1 57270.083 4242.228 .000
Diet 102.083 1 102.083 7.562 .025
Exercise 816.750 1 816.750 60.500 .000
Diet * Exercise 444.083 1 444.083 32.895 .000
Error 108.000 8 13.500
Total 58741.000 12
Corrected Total 1470.917 11
a. R Squared = .927 (Adjusted R Squared = .899)
The Decision Rule: With respect to the interaction between the Diet and Exercise factors, since the p-
value<0.001<0.05=, we reject H0,DietxExercise.
With respect to the Diet factor, since the p-value=0.025<0.05=, we reject H0,Diet.
With respect to the Exercise factor, since the p-value<0.001<0.05=, we reject H0,Exercise. The Conclusion:
At the 5% level of significance, we have evidence to conclude that there is a difference between the true mean glucose blood levels associated with the two different diets (with a p-value=0.025), that there is a difference between the true mean glucose blood levels associated with the two different exercise programs (p-value<0.001), and that there is an interaction effect between one's diet and exercise program on the true mean glucose levels in the blood (p-value<0.001).
Normally we would now have to do a post-hoc analysis to determine how the means differ and what exactly is the interaction effect. We can compare the sample means and the corresponding confidence intervals to determine how the means differ but how do we determine the interaction effect? To help determine the interaction effect, we can carry out a profile analysis.
■
Module #2: Analysis of Variance
30
The Profile of a Factor To graphically see how two factors interact (if at all), one can plot the means and corresponding confidence intervals for each level of one factor, in which the means are connected with a line, against the levels of a second factor. Based on the relationship between the resulting lines, one can determine how one factor affects the other factor. The interpretation of this plot is referred to as a profile analysis. A factor does not affect the response variable if the profile of the factor is horizontal for all combinations of levels of the other factors, that is there is no change in the response variable when you change the levels of the factor (true for all combinations of levels of the other factors); otherwise the factor is said to affect the response variable. If the graph looks as follows, then Factor A has no effect on Factor B:
Two factors are additive if the change in the response variable (for the different levels of one factor) is statistically the same for each of the levels of the other factor. If the graph looks as follows, then Factor A and Factor B are additive.
0
10
20
30
40
50
60
70
0 20 40 60
Factor A has no effect
A
B
Module #2: Analysis of Variance
31
Two factors interact if the change in the response variable (for different levels of one factor) is not statistically the same for some of the levels of the other factor. In order to conclude that two factors interact, profiles of the first factor for different levels of the second factor cannot be statistically parallel. For example, if the profile looks as follows, we would conclude Factor A and Factor B interact.
NOTE: If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact. The testing continues for the lower order interactions and main effects of the factors that have not yet been determined to affect the response. For Practice: For the Diet and Exercise effect on glucose levels example, plot the profiles with
0
10
20
30
40
50
60
70
0 20 40 60
Additive Factors
A
B
0
10
20
30
40
50
60
70
0 20 40 60
Interacting Factors
A
B
Module #2: Analysis of Variance
32
the two diet levels on the horizontal axis and the individual exercise levels as individual lines. Solution: The profile plot with the two diet levels on the horizontal axis and the individual exercise levels represented by individual lines is below.
To interpret the above plot, we need to know if the marginal means for each exercise program with respect to a particular diet differ, but from the above plot, we cannot determine this. A more detailed profile plot combines the information in the above plot with the associated confidence intervals for each true marginal mean. The plot below includes the profiles as depicted in the above plot with the estimated 95% confidence intervals for the true mean associated each diet/exercise program combination.
Module #2: Analysis of Variance
33
In the above plot, we can see that the 95% confidence interval for the true mean glucose level of individuals on Diet 1 who participated in Exercise Program 1 significantly overlaps the 95% confidence interval for the true mean glucose level of individuals on Diet 1 who participated in Exercise Program 2. Hence statistically we cannot distinguish between these two means. BUT, the 95% confidence interval for the true mean glucose level of individuals on Diet 2 who participated in Exercise Program 1 does not overlap the 95% confidence interval for the true mean glucose level of individuals on Diet 2 who participated in Exercise Program 2. Hence statistically we would conclude that these two means differ. We therefore have evidence to believe that the slope of the line representing the impact of Exercise Program 1 as a function of Diet on the mean glucose level differs from the slope of the line representing the impact of Exercise Program 2 as a function of Diet on the mean glucose level. The upshot is we have evidence supporting that the diet and exercise program have an interaction effect on the true mean glucose level.
■ For Discussion: With reference to the above Profile Plot for the Diet-Exercise-Glucose example, describe the interaction effect between the diets and exercise programs on the true mean glucose level. Based on the statistical evidence displayed in the above plot, what diet/exercise combination would you recommend if the goal was to minimize blood-glucose
Module #2: Analysis of Variance
34
levels? Answer: Because, with respect to Diet 1, the 95% confidence intervals for the true mean glucose levels of individuals on Exercise Programs 1 and 2 overlap, statistically we cannot distinguish between these two means and hence Exercise Programs 1 and 2 have no effect on the true mean glucose levels of individuals on Diet 1. With respect to Diet 2, the 95% confidence intervals for the true mean glucose levels of individuals on Exercise Programs 1 and 2 do not overlap; statistically these two means differ. Hence Exercise Programs 1 and 2 do impact the true mean glucose levels of individuals on Diet 2. In fact, it appears that the true mean glucose level of individuals on Diet 2 and Exercise Program 1 is lower than the true mean glucose level of individuals on Diet 2 and Exercise Program 2. The above discussion eliminates the Diet 2/Exercise Program 2 combination as a candidate for minimizing the mean glucose level. Statistically (because the confidence intervals in the above plot overlap), we cannot distinguish the true mean glucose levels of individuals on Diet 1/Exercise Program 1, Diet 1/Exercise Program 2, and Diet 2/Exercise Program 1 from one-and-another. As a result, we might be tempted to recommend any of the three combinations. BUT, note the widths of the confidence intervals. There was the least variation in the Diet 1/Exercise Program 1 blood glucose levels. The question becomes is the variation in the Diet 1/Exercise Program 1 data statistically smaller than the variation in the data associated with the Diet 1/Exercise Program 2 and Diet 2/Exercise Program 1? Since it appears that the confidence interval associated with Diet 2/Exercise Program 1 is more than twice as wide as the confidence interval associated with Diet 1/Exercise Program 1, the blood glucose levels of individuals in the Diet 2/Exercise Program 1 statistically vary more than the blood glucose levels of individuals in the Diet 1/Exercise Program 1. Hence we would recommend Diet 1/Exercise Program 1 over Diet 2/Exercise Program 1 if we want to minimize blood-glucose levels. Similarly one can argue that Diet 1/Exercise Program 1 should be chosen over Diet 1/Exercise Program 2 if we want to minimize blood-glucose level.
■
Module #2: Analysis of Variance
35
Understanding and Interpreting Two-Way ANOVA If one or more of the overall effects is significant, several “post hoc” procedures can be conducted. Which procedure to conduct depends on which effects were significant. A researcher may want to look at the interaction effects in place of, or possibly in addition to, the simple main effects. The simplest interaction effects analysis involve four means and are referred to as tetrad contrasts. Tetrad contrasts involve whether the differences in population means between two levels of one factor are the same across two levels of a second factor. If the interaction effect is not significant, the focus of the analysis turns to the main effects. Depending on which effects were significant, a researcher may want to compare the differences in the population means among levels of the first factor for each level of the second factor; the differences in the populations means among the levels of the second factor for each level of the first factor, or both. To illustrate the techniques described above, we will use the contrived data sets taken from Using SPSS for Windows and Macintosh. Now Your Turn: Suppose a researcher is interested in two methods of note-taking strategies and the effect of these methods on the overall GPAs of first year college students. After randomly selecting 30 men and 30 women to participate, 10 women and 10 men are randomly assigned to Method 1; 10 men and 10 women are randomly assigned to Method 2; and the remaining 10 men and 10 women were assigned to Method 3 (the control method). During the first term, individuals in the Method 1 and 2 groups were given daily instruction on the corresponding note-taking method while the Method 3 group received no note-taking instructions. The GPAs for all the participants were recorded at the ends of the second and third term. Analyze the data collected at the 5% level of significance. Use Lesson 25 Data File 1. At this point, do not test the underlying assumptions for the Two-Way ANOVA. For Discussion: We were told to not test the assumptions required for the above conclusion to be valid, but:
(1) How would we test the normality assumption?
(2) How would we test the equality of variances assumption?
Understanding how the Main Effect Influences the Mean
Suppose we are interested in exploring further the average effect of one’s gender and note-
taking ability. We could use Syntax Programming to assist us. We illustrate Syntax
Programming in the answer to the following discussion question.
For Discussion: Based on the analysis of the Gender—Note-taking Method—GPA example, one's gender and the note-taking method individually impact the mean GPA but the interaction between gender and the method does not impact the mean GPA. Which simple main effects (the effects of the levels within in a factor) should be analyzed? Answer:
Module #2: Analysis of Variance
36
The simple main effects that need to be analyzed can be determined from the following SPSS syntax. In order to get SPSS to carry out an analysis of the simple main effects, follow the following instructions: 1) Analyze->General Linear Model->Univariate
2) Paste (you are now in the SPSS syntax editor) 3) Delete everything you see EXCEPT THE FIRST THREE LINES 4) On the fourth line begin typing:
/lmatrix 'men vs women within Method 1' gender*method 1 0 0 -1 0 0 gender 1 -1 /lmatrix 'men vs women within Method 2' gender*method 0 1 0 0 -1 0 gender 1 -1 /lmatrix 'men vs women within Control' gender*method 0 0 1 0 0 -1 gender 1 -1 /lmatrix 'method within men' gender*method 1 -1 0 0 0 0 method 1 -1 0; gender*method 0 1 -1 0 0 0 method 0 1 -1; gender*method 1 0 -1 0 0 0 method 1 0 -1 /lmatrix 'method within women' gender*method 0 0 0 1 -1 0 method 1 -1 0; gender*method 0 0 0 0 1 -1 method 0 1 -1; gender*method 0 0 0 1 0 -1 method 1 0 -1.
5) Highlight all the syntax, click RUN and then click SELECTION Note: The syntax in the blue ellipse also generates the p-values to individually test the following sets of hypotheses:
(1) The first line of syntax generates the p-value to test H0: Male,Method1 Male,Method2 against HA: Male,Method1 Male,Method2;
(2) The second line of syntax generates the p-value to test H0: Male,Method2 Male,Control against HA: Male,Method2 Male,Control; and
(3) The third line of syntax generates the p-value to test H0: Male,Method1 Male,Control against HA: Male,Method1 Male,Control.
Further note: The syntax in the red ellipse also generates the p-values to individually test the following sets of hypotheses:
(1) The first line of syntax generates the p-value to test H0: Female,Method1 Female,Method2 against HA: Female,Method1 Female,Method2;
(2) The second line of syntax generates the p-value to test H0: Female,Method2 Female,Control against HA: Female,Method2 Female,Control; and
(3) The third line of syntax generates the p-value to test
H0: Female,Method1 Female,Control against HA: Female,Method1 Female,Control.
Comment [MLS1]: Generates p-value
to test H0: Male,Method1 Female,Method1 against HA: Male,Method1 Female,Method1
Comment [MLS2]: Generates p-value
to test H0: Male,Method2 Female,Method2 against HA: Male,Method2 Female,Method2
Comment [MLS3]: Generates p-value
to test H0: Male,Control Female,Control against HA: Male,Control Female,Control
Comment [MLS4]: Generates p-value
to test H0: Male,Method1 Male,Method2 Male,Control against HA: not H0
Comment [MLS5]: Generates p-value
to test H0: Female,Method1 Female,Method2 Female,Control against HA: not H0
Module #2: Analysis of Variance
37
The results of the above syntax are:
Custom Hypothesis Tests #1
Contrast Results (K Matrix)a
Contrast Dependent
Variable Change in GPA
L1 Contrast Estimate .165
Hypothesized Value 0
Difference (Estimate - Hypothesized) .165
Std. Error .081
Sig. .047
95% Confidence Interval for
Difference
Lower Bound .002
Upper Bound .328
a. Based on the user-specified contrast coefficients (L') matrix: men vs women within
Method 1
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .136 1 .136 4.130 .047
Error 1.780 54 .033
Custom Hypothesis Tests #2
Contrast Results (K Matrix)a
Contrast Dependent
Variable Change in GPA
L1 Contrast Estimate .335
Hypothesized Value 0
Difference (Estimate - Hypothesized) .335
Std. Error .081
Sig. .000
95% Confidence Interval for
Difference
Lower Bound .172
Upper Bound .498
Comment [MLS6]: p-value to test
H0: Male,Method1 Female,Method1 against HA: Male,Method1 Female,Method1
Comment [MLS7]: note the same p-value that was used to test
H0: Male,Method1 Female,Method1 against HA: Male,Method1 Female,Method1
Comment [MLS8]: p-value to test
H0: Male,Method2 Female,Method2 against HA: Male,Method2 Female,Method2
Module #2: Analysis of Variance
38
Contrast Results (K Matrix)a
Contrast Dependent
Variable Change in GPA
L1 Contrast Estimate .335
Hypothesized Value 0
Difference (Estimate - Hypothesized) .335
Std. Error .081
Sig. .000
95% Confidence Interval for
Difference
Lower Bound .172
Upper Bound .498
a. Based on the user-specified contrast coefficients (L') matrix: men vs women within
Method 2
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .561 1 .561 17.023 .000
Error 1.780 54 .033
Custom Hypothesis Tests #3
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate .060
Hypothesized Value 0
Difference (Estimate - Hypothesized) .060
Std. Error .081
Sig. .463
95% Confidence Interval for
Difference
Lower Bound -.103
Upper Bound .223
Comment [MLS8]: p-value to test H0: Male,Method2 Female,Method2 against
HA: Male,Method2 Female,Method2
Comment [MLS9]: Note the same p-value that was used to test
H0: Male,Method2 Female,Method2 against HA: Male,Method2 Female,Method2
Comment [MLS10]: p-value to test
H0: Male,Control Female,Control against HA: Male,Control Female,Control
Module #2: Analysis of Variance
39
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate .060
Hypothesized Value 0
Difference (Estimate - Hypothesized) .060
Std. Error .081
Sig. .463
95% Confidence Interval for
Difference
Lower Bound -.103
Upper Bound .223
a. Based on the user-specified contrast coefficients (L') matrix: men vs women within
Control
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .018 1 .018 .546 .463
Error 1.780 54 .033
Custom Hypothesis Tests #4
Contrast Results (K Matrix)a
Contrast Dependent
Variable Change in GPA
L1 Contrast Estimate -.305
Hypothesized Value 0
Difference (Estimate - Hypothesized) -.305
Std. Error .081
Sig. .000
95% Confidence Interval for
Difference
Lower Bound -.468
Upper Bound -.142
Comment [MLS10]: p-value to test
H0: Male,Control Female,Control against HA: Male,Control Female,Control
Comment [MLS11]: note the same p-value that was used to test
H0: Male,Control Female,Control against HA: Male,Control Female,Control
Comment [MLS12]: p-value to test
H0: Male,Method1 Male,Method2 against HA: Male,Method1 Male,Method2
Module #2: Analysis of Variance
40
L2 Contrast Estimate .475
Hypothesized Value 0
Difference (Estimate - Hypothesized) .475
Std. Error .081
Sig. .000
95% Confidence Interval for
Difference
Lower Bound .312
Upper Bound .638
L3 Contrast Estimate .170
Hypothesized Value 0
Difference (Estimate - Hypothesized) .170
Std. Error .081
Sig. .041
95% Confidence Interval for
Difference
Lower Bound .007
Upper Bound .333
a. Based on the user-specified contrast coefficients (L') matrix: method within men
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast 1.158 2 .579 17.573 .000
Error 1.780 54 .033
Custom Hypothesis Tests #5
Contrast Results (K Matrix)a
Contrast Dependent
Variable
Change in GPA
L1 Contrast Estimate -.135
Hypothesized Value 0
Difference (Estimate - Hypothesized) -.135
Std. Error .081
Sig. .102
95% Confidence Interval for
Difference
Lower Bound -.298
Upper Bound .028
Comment [MLS13]: p-value to test
H0: Male,Method2 Male,Control against HA: Male,Method2 Male,Control
Comment [MLS14]: p-value to test H0: Male,Method1 Male,Control against
HA: Male,Method1 Male,Control
Comment [MLS15]: p-value to test
H0: Male,Method1 Male,Method2 Male,Control against HA: not H0
Comment [MLS16]: p-value to test
H0: Female,Method1 Female,Method2 against HA: Female,Method1 Female,Method2
Module #2: Analysis of Variance
41
L2 Contrast Estimate .200
Hypothesized Value 0
Difference (Estimate - Hypothesized) .200
Std. Error .081
Sig. .017
95% Confidence Interval for
Difference
Lower Bound .037
Upper Bound .363
L3 Contrast Estimate .065
Hypothesized Value 0
Difference (Estimate - Hypothesized) .065
Std. Error .081
Sig. .427
95% Confidence Interval for
Difference
Lower Bound -.098
Upper Bound .228
a. Based on the user-specified contrast coefficients (L') matrix: method within women
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .208 2 .104 3.158 .050
Error 1.780 54 .033
■
For Discussion: What conclusions would we make using the test statistic values and p-values presented in the tables associated with the above Custom Hypothesis Tests 1 through 5? Answer: From the Test Results table in the Custom Hypothesis Test #1, since the p-
value=0.047<0.05=we reject the hypothesis that Male,Method1 Female,Method1. Therefore at
the 5% level of significance, we have evidence to conclude that the true mean GPA of male students who used note-taking method 1 differs from the true mean GPA of female students who used note-taking method 1 (p-value=0.047). In fact, from the hypothesized difference reported in the Custom Hypothesis Test #1, we would conclude that the true mean GPA of male students who used note-taking method 1 is greater than the true mean GPA of female students who used note-taking method 1. From the Test Results table in the Custom Hypothesis Test #2, since the p-
value<0.001<0.05=we reject the hypothesis that Male,Method2 Female,Method2. Therefore at the 5% level of significance, we have evidence to conclude that the true mean GPA of male students who used note-taking method 2 differs from the true mean GPA of female students
Comment [MLS17]: p-value to test
H0: Female,Method2 Female,Control against HA: Female,Method2 Female,Control
Comment [MLS18]: p-value to test
H0: Female,Method1 Female,Control against HA: Female,Method1 Female,Control
Comment [MLS19]: p-value to test
H0: Female,Method1 Female,Method2 Female,Control against HA: not H0
Module #2: Analysis of Variance
42
who used note-taking method 2 (p-value<0.001). In fact, from the hypothesized difference reported in the Custom Hypothesis Test #2, we would conclude that the true mean GPA of male students who used note-taking method 2 is greater than the true mean GPA of female students who used note-taking method 2. From the Test Results table in the Custom Hypothesis Test #3, since the p-
value=0.463>0.05=we do not reject the hypothesis that Male,Control Female,Control. Therefore
at the 5% level of significance, we do not have evidence to reject the hypothesis that the true mean GPA of male students who used the control note-taking method equals the true mean GPA of female students who used the control note-taking method (p-value=0.463). From the Test Results table in the Custom Hypothesis Test #4, since the p-
value<0.001<0.05=we reject the hypothesis that Male,Method1 Male,Method2 Male,Control.
Therefore at the 5% level of significance, we have evidence to conclude that at least two of the true mean GPAs of male students associated with the three note-taking methods differ (p-value<0.001). To determine how the true mean male GPA differs with respect to the note-taking methods, we refer to the rows in the Custom Hypothesis Test #4 table labelled L1, L2, and L3.
- The row labelled L1 displays the information used to test the null hypothesis:
Male,Method1 Male,Method2. Because the p-value<0.001<0.05=we have evidence to
conclude that the true mean GPA of male students who use note-taking Method 1 differs from the true mean GPA of male students who use note-taking Method 2 (p-value < 0.001). In fact, from the hypothesis difference reported in row L1, there is evidence to conclude that the true mean GPA of male students who use note-taking Method 1 is less than the true mean GPA of male students who use note-taking Method 2.
- The row labelled L2 displays the information used to test the null hypothesis:
Male,Method2 Male,Control. Because the p-value<0.001<0.05=we have evidence to
conclude that the true mean GPA of male students who use note-taking Method 2 differs from the true mean GPA of male students who use the control note-taking method (p-value < 0.001). In fact, from the hypothesis difference reported in row L2, there is evidence to conclude that the true mean GPA of male students who use note-taking Method 2 is greater than the true mean GPA of male students who use the control note-taking method.
- The row labelled L3 displays the information used to test the null hypothesis:
Male,Method1 Male,Control. Because the p-value=0.041<0.05=we have evidence to
conclude that the true mean GPA of male students who use note-taking Method 1 differs from the true mean GPA of male students who use the control note-taking method (p-value < 0.001). In fact, from the hypothesis difference reported in row L2, there is evidence to conclude that the true mean GPA of male students who use note-taking Method 1 is greater than the true mean GPA of male students who use the control note-taking method.
From the Test Results table in the Custom Hypothesis Test #5, since the p-value=0.05=we
do not reject the hypothesis that Female,Method1 Female,Method2 Female,Control. Therefore at the
Module #2: Analysis of Variance
43
5% level of significance, we have do not have evidence to refute that the true mean GPAs of female students associated with the three note-taking methods are all equal (p-value=0.05). Consequently there is no need to consider the information presented in rows L1, L2, and L3 of this table.
■ For Discussion: Based on the above analysis, which note-taking method (if any) would you recommend to male students and which note-taking method (if any) would you recommend to female students? Be sure to justify your response. Answer: Because the true mean GPA of male students who used note-taking Method 2 is statistically greater than the true mean GPAs of male students who used either the control note-taking method or note-taking Method 1, on average the GPAs of male students are greater when note-taking Method 2 is used. Consequently I would recommend note-taking Method 2 to the male students. From the discussion relevant to the Custom Hypothesis Test #5 table, it might appear that it does not matter statistically which note-taking method we recommend for female students. But, because the least (statistically significant) difference between the true mean GPAs of male and female students occurs for note-taking Method 2, I also would recommend note-taking Method 2 to the female students.
■
For Discussion: Suppose we wanted to, pairwise, investigate how the means associated with two levels of one factor differ with respect to the levels of another factor. How would we compare how the different note-taking methods compare for a specific gender? Answer:
In order to get SPSS to generate the required output for this analysis, follow the following instructions: 1) Analyze->General Linear Model->Univariate
2) Paste (you are now in the SPSS syntax editor) 3) Delete everything you see EXCEPT THE FIRST THREE LINES 4) On the fourth line begin typing:
/lmatrix 'Method 1 vs. Method 2 within men' gender*method 1 -1 0 0 0 0 method 1 -1 0 /lmatrix 'Method 1 vs. Control within men' gender*method 1 0 -1 0 0 0 method 1 0 -1 /lmatrix 'Method 2 vs. Control within men' gender*method 0 1 -1 0 0 0 method 0 1 -1 /lmatrix 'Method 1 vs. Method 2 within women' gender*method 0 0 0 1 -1 0 method 1 -1 0
Comment [MLS20]: Generates p-
value to test H0: Male,Method1 Male,Method2 against HA: Male,Method1 Male,Method2
Comment [MLS21]: Generates p-
value to test H0: Male,Method1 Male,Control against HA: Male,Method1 Male,Control
Comment [MLS22]: Generates p-
value to test H0: Male,Method2 Male,Control against HA: Male,Method2 Male,Control
Comment [MLS23]: Generates p-
value to test H0: Female,Method1 Female,Method2 against HA: Female,Method1 Female,Method2
Module #2: Analysis of Variance
44
/lmatrix 'Method 1 vs. Control within women' gender*method 0 0 0 1 0 -1 method 1 0 -1 /lmatrix 'Method 2 vs. Control within women' gender*method 0 0 0 0 1 -1 method 0 1 -1.
5) Highlight all the syntax, click RUN and then click SELECTION
Comment [MLS24]: Generates p-
value to test H0: Female,Method1 Female,Control against HA: Female,Method1 Female,Control
Comment [MLS25]: Generates p-
value to test H0: Female,Method2 Female,Control against HA: Female,Method2 Female,Control
Module #2: Analysis of Variance
45
The results of the above code are:
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate -.305
Hypothesized Value 0
Difference (Estimate - Hypothesized) -.305
Std. Error .081
Sig. .000
95% Confidence Interval for
Difference
Lower Bound -.468
Upper Bound -.142
a. Based on the user-specified contrast coefficients (L') matrix: Method 1 vs. Method 2
within men Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .465 1 .465 14.111 .000
Error 1.780 54 .033
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate .170
Hypothesized Value 0
Difference (Estimate - Hypothesized) .170
Std. Error .081
Sig. .041
95% Confidence Interval for
Difference
Lower Bound .007
Upper Bound .333
Module #2: Analysis of Variance
46
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate .170
Hypothesized Value 0
Difference (Estimate - Hypothesized) .170
Std. Error .081
Sig. .041
95% Confidence Interval for
Difference
Lower Bound .007
Upper Bound .333
a. Based on the user-specified contrast coefficients (L') matrix: Method 1 vs. Control
within men Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .144 1 .144 4.384 .041
Error 1.780 54 .033
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate .475
Hypothesized Value 0
Difference (Estimate - Hypothesized) .475
Std. Error .081
Sig. .000
95% Confidence Interval for
Difference
Lower Bound .312
Upper Bound .638
Module #2: Analysis of Variance
47
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate .475
Hypothesized Value 0
Difference (Estimate - Hypothesized) .475
Std. Error .081
Sig. .000
95% Confidence Interval for
Difference
Lower Bound .312
Upper Bound .638
a. Based on the user-specified contrast coefficients (L') matrix: Method 2 vs. Control
within men
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast 1.128 1 1.128 34.224 .000
Error 1.780 54 .033
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate -.135
Hypothesized Value 0
Difference (Estimate - Hypothesized) -.135
Std. Error .081
Sig. .102
95% Confidence Interval for
Difference
Lower Bound -.298
Upper Bound .028
Module #2: Analysis of Variance
48
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate -.135
Hypothesized Value 0
Difference (Estimate - Hypothesized) -.135
Std. Error .081
Sig. .102
95% Confidence Interval for
Difference
Lower Bound -.298
Upper Bound .028
a. Based on the user-specified contrast coefficients (L') matrix: Method 1 vs. Method 2
within women
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .091 1 .091 2.764 .102
Error 1.780 54 .033
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate .065
Hypothesized Value 0
Difference (Estimate - Hypothesized) .065
Std. Error .081
Sig. .427
95% Confidence Interval for
Difference
Lower Bound -.098
Upper Bound .228
Module #2: Analysis of Variance
49
Contrast Results (K Matrix)a
Contrast
Dependent
Variable Change in GPA
L1 Contrast Estimate .065
Hypothesized Value 0
Difference (Estimate - Hypothesized) .065
Std. Error .081
Sig. .427
95% Confidence Interval for
Difference
Lower Bound -.098
Upper Bound .228
a. Based on the user-specified contrast coefficients (L') matrix: Method 1 vs. Control
within women
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .021 1 .021 .641 .427
Error 1.780 54 .033
Contrast Results (K Matrix)a
Contrast Dependent
Variable Change in GPA
L1 Contrast Estimate .200
Hypothesized Value 0
Difference (Estimate - Hypothesized) .200
Std. Error .081
Sig. .017
95% Confidence Interval for
Difference
Lower Bound .037
Upper Bound .363
Module #2: Analysis of Variance
50
Contrast Results (K Matrix)a
Contrast Dependent
Variable Change in GPA
L1 Contrast Estimate .200
Hypothesized Value 0
Difference (Estimate - Hypothesized) .200
Std. Error .081
Sig. .017
95% Confidence Interval for
Difference
Lower Bound .037
Upper Bound .363
a. Based on the user-specified contrast coefficients (L') matrix: Method 2 vs. Control
within women
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .200 1 .200 6.067 .017
Error 1.780 54 .033
For Discussion: Compare the output generated by this syntax to the corresponding output generated by the previous syntax. Do you notice any similarities amongst the output?
Answer:
The information presented in the output generated by the syntax in this section is also included as some of the output generated by the syntax in the previous section.
■
Understanding how an interaction affects the mean
We will use the following example to demonstrate how the interaction between two factors affects the mean. For Practice: Now, let us redo the previous example but with Lesson 25 Data File 2.
Solution:
Module #2: Analysis of Variance
51
Research Question: Does one's gender and note-taking ability affect the GPA of a student?
Population Declarations: Let Factor A be the gender of the individuals. Let Level 1 of Factor A be male. Let Level 2 of Factor A be female. Let Factor B be the note-taking method used by the individuals. Let Level 1 of Factor B be Method 1. Let Level 2 of Factor B be Method 2. Let Level 3 of Factor B be Control group.
Let Population 1 be the male students who use note-taking Method 1. Let Male,Method1 be the
true mean GPA of Population 1.
Let Population 2 be the male students who use note-taking Method 2. Let Male,Method2 be the
true mean GPA of Population 2.
Let Population 3 be the male students who use the control note-taking method. Let Male,Control be the true mean GPA of Population 3.
Let Population 4 be the female students who use note-taking Method 1. Let Female,Method1 be the true mean GPA of Population 4.
Let Population 5 be the female students who use note-taking Method 2. Let Female,Method2 be the true mean GPA of Population 5. Let Population 6 be the female students who use the control note-taking method. Let
Female,Control be the true mean GPA of Population 6.
Hypotheses to be tested:
GenderH ,0 : there is no difference between the true mean GPAs based on gender.
GenderaH , : there is a difference between the true mean GPAs based on gender.
MethodH ,0 : there is no difference between the true mean GPAs based on the three note-
taking methods.
MethodaH , : there is a difference between at least two of the true mean GPAs based on the
three note-taking methods.
MethodGenderH ,0 : there is no interaction effect between one's gender and note-taking ability
on the true mean GPAs.
MethodGenderaH , : there is an interaction effect between one's gender and note-taking ability
on the true mean GPAs.
Hypothesis Test to be used: Two-Way ANOVA for Two Fixed Effects Factors
Assumptions required to implement the hypothesis test: We are told to not test the assumptions. (Recall the assumptions are:
1) The populations from which each of the random samples was taken must be normal.
Module #2: Analysis of Variance
52
2) The populations must have the same variances. 3) The samples must be independent of one another. 4) The groups must be equal in sample size.)
The Significance Level: 05.0
The Test Statistic and corresponding p-value:
From the Tests of Between-Subjects Effects Table below, the value of the test statistic FGender(1,54)=0.612 and its associated p-value=0.436, the value of the test statistic FMethod(2,54)=17.809 and its associated p-value<0.001, and the value of the test statistic FGenderxMethod(2,54)=10.543 and its associated p-value<0.001.
Tests of Between-Subjects Effects
Dependent Variable:Change in GPA
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 1.889a 5 .378 11.463 .000
Intercept 4.931 1 4.931 149.582 .000
gender .020 1 .020 .612 .438
method 1.174 2 .587 17.809 .000
gender * method .695 2 .348 10.543 .000
Error 1.780 54 .033
Total 8.600 60
Corrected Total 3.669 59
a. R Squared = .515 (Adjusted R Squared = .470)
The Decision Rule:
Regarding Gender, since the p-value=0.438>0.05=, we do not reject H0,Gender.
Regarding the Note-taking Method, since the p-value<0.001<0.05=, we reject H0,Method. Regarding the interaction between Gender and Note-taking Method, since the p-
value<0.001<0.05=, we reject H0,GenderxMethod. The Conclusion:
At the 5% level of significance, we have evidence to conclude that there is a difference between the true mean GPAs based on the three note-taking methods (p-value<0.001) and that there is an interaction effect between one’s gender and his/her note-taking method on the true mean GPAs (p-value<0.001). At the same level of significance, there is no evidence to conclude that there is a difference in at least two of the true mean GPAs based on one’s gender (p-value=0.438).
■
Module #2: Analysis of Variance
53
For Discussion: We were told to not test the assumptions required for the above conclusion to
be valid, but:
(1) How would we test the normality assumption? Solution: We have to test that each of the six populations are normally distributed, that is we have to implement six normality tests. The required Tests of Normality Table generated by SPSS is included below.
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic Df Sig. Statistic df Sig.
M1 .161 10 .200* .964 10 .830
M2 .145 10 .200* .967 10 .857
MC .132 10 .200* .979 10 .958
F1 .224 10 .170 .869 10 .096
F2 .289 10 .018 .906 10 .254
FC .214 10 .200* .936 10 .512
a. Lilliefors Significance Correction
*. This is a lower bound of the true significance.
Referring to Population 1 (Line M1 in the Tests of Normality Table above): since the p-
value=0.830>0.05=, we do not have evidence to reject the assumption that the GPAs
of men who use note-taking method 1 are normally distributed.
Referring to Population 2 (Line M2 in the Tests of Normality Table above): since the p-
value=0.857>0.05=, we do not have evidence to reject the assumption that the GPAs
of men who use note-taking method 2 are normally distributed.
Referring to Population 3 (Line MC in the Tests of Normality Table above): since the p-
value=0.958>0.05=, we do not have evidence to reject the assumption that the GPAs
of men who use the control note-taking method are normally distributed.
Referring to Population 4 (Line F1 in the Tests of Normality Table above): since the p-
value=0.096>0.05=, we do not have evidence to reject the assumption that the GPAs
of women who use note-taking method 1 are normally distributed.
Module #2: Analysis of Variance
54
Referring to Population 5 (Line F2 in the Tests of Normality Table above): since the p-
value=0.254>0.05=, we do not have evidence to reject the assumption that the GPAs
of women who use note-taking method 2 are normally distributed.
Referring to Population 6 (Line FC in the Tests of Normality Table above): since the p-
value=0.512>0.05=, we do not have evidence to reject the assumption that the GPAs
of women who use the control note-taking method are normally distributed.
Since we do not have evidence to reject any one of the six populations is normally distributed, we cannot conclude that the normality assumption has been violated.
■
(2) How would we test the equality of variances assumption?
Solution:
We have to test the hypotheses:
The required Test for the Homogeneity of Variances Table is included below.
Test of Homogeneity of Variances
Change in GPA
Levene Statistic df1 df2 Sig.
.575 5 54 .719
Referring to the Test of Homogeneity of Variances Table above, the value of the test
statistic computed using Levene’s test for the equality of variances is L (5; 54) =0.575
and its associated p-value= 0.719. Since p-value= 0.719 > 0.05 = α, there is no
evidence to reject the assumption that all six populations have the same variance.
■
For Discussion: In this past example, we saw that the interaction between gender and the
note-taking method significantly impacts the mean GPA. Just how do they impact the mean GPA? How would one conduct an interaction comparison after finding a significant interaction? Answer: We use syntax programming. The relevant instructions are given below. 1) Analyze->General Linear Model->Univariate
Module #2: Analysis of Variance
55
2) Paste (you are now in the SPSS syntax editor) 3) Delete everything you see EXCEPT THE FIRST THREE LINES 4) On the fourth line begin typing:
/lmatrix '(Method 1 vs. Method 2) for men vs (Method 1 vs. Method 2) for women' gender*method 1 -1 0 -1 1 0 /lmatrix '(Method 1 vs. Control) for men vs (Method 1 vs. Control) for women' gender*method 1 0 -1 -1 0 1 /lmatrix '(Method 2 vs. Control) for men vs (Method 2 vs. Control) for women' gender*method 0 1 -1 0 -1 1.
5) Highlight all the syntax, click RUN and then click SELECTION
The results of the above code are:
Custom Hypothesis Tests #1
Contrast Results (K Matrix)a
Contrast
Dependent Variable
Change in GPA
L1 Contrast Estimate -.170
Hypothesized Value 0
Difference (Estimate - Hypothesized) -.170
Std. Error .115
Sig. .145
95% Confidence Interval for
Difference
Lower Bound -.400
Upper Bound .060
a. Based on the user-specified contrast coefficients (L') matrix: (Method 1 vs. Method 2) for men vs
(Method 1 vs. Method 2) for women
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .072 1 .072 2.192 .145
Error 1.780 54 .033
Comment [MLS26]: Generates p-value to test
H0: Male,Method1 Male,Method2= Female,Method1
Female,Method2 against HA: Male,Method1 Male,Method2 Female,Method1 Female,Method2
Comment [MLS27]: Generates p-value to test
H0: Male,Method1 Male,Control= Female,Method1
Female,Control against HA: Male,Method1 Male,Control Female,Method1 Female,Control
Comment [MLS28]: Generates p-value to test
H0: Male,Method2 Male,Control= Female,Method2
Female,Control against HA: Male,Method2 Male,Control Female,Method2 Female,Control
Comment [MLS29]: p-value to test
H0: Male,Method1 Male,Method2= Female,Method1
Female,Method2 against HA: Male,Method1 Male,Method2 Female,Method1 Female,Method2
Module #2: Analysis of Variance
56
Custom Hypothesis Tests #2
Contrast Results (K Matrix)a
Contrast
Dependent Variable
Change in GPA
L1 Contrast Estimate .105
Hypothesized Value 0
Difference (Estimate - Hypothesized) .105
Std. Error .115
Sig. .365
95% Confidence Interval for
Difference
Lower Bound -.125
Upper Bound .335
a. Based on the user-specified contrast coefficients (L') matrix: (Method 1 vs. Control) for men vs
(Method 1 vs. Control) for women
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .028 1 .028 .836 .365
Error 1.780 54 .033
Custom Hypothesis Tests #3
Contrast Results (K Matrix)a
Contrast
Dependent Variable
Change in GPA
L1 Contrast Estimate .275
Hypothesized Value 0
Difference (Estimate - Hypothesized) .275
Std. Error .115
Sig. .020
95% Confidence Interval for Lower Bound .045
Comment [MLS30]: p-value to test H0: Male,Method1 Male,Control= Female,Method1
Female,Control against HA: Male,Method1 Male,Control Female,Method1 Female,Control
Module #2: Analysis of Variance
57
Difference Upper Bound .505
a. Based on the user-specified contrast coefficients (L') matrix: (Method 2 vs. Control) for men vs
(Method 2 vs. Control) for women
Test Results
Dependent Variable:Change in GPA
Source Sum of Squares df Mean Square F Sig.
Contrast .189 1 .189 5.736 .020
Error 1.780 54 .033
■ NOTE: The F-tests implemented in these tetrad comparisons do not control Type I Error.
For Discussion: What conclusions would you make based on the p-values presented in the above three custom hypothesis test tables? NOTE: Although learning syntax programming is part of the course, it is not the focus of the course. If a student chooses to “skip” the syntax programming, a student can still attain an excellent (high 80’s) to exceptional (low 90’s) grade in the course. If a student chooses to “skip” the syntax programming portion of an analysis when it is warranted, s/he must state in his/her solution that s/he recognizes that syntax programming is required to complete the analysis but s/he has chosen not to do the analysis. It is expected that a student will know when syntax programming should be used.
Comment [MLS31]: p-value to test
H0: Male,Method2 Male,Control= Female,Method2
Female,Control against HA: Male,Method2 Male,Control Female,Method2 Female,Control
Module #2: Analysis of Variance
58
The One-Way and Two-Way ANOVA discussions were based on an experimental design referred to as the Completely Randomized Design. We formed a set of treatment combinations
(based on the k factors) and randomly assigned a fixed number n of experimental units to each treatment combination. There are other experimental designs. We are now going to study two other experimental designs: first the Randomized Block Design and then the Repeated Measures Design.
The Randomized Block Design Suppose a researcher is interested in how several treatments affect a continuous response variable. The treatments may be levels of a single factor or they may be combinations of levels of several factors. Suppose we have a fixed number of experimental units available to which we
need to apply the different treatments (say t treatments). A Randomized Block Design divides
the group of experimental units into a fixed number of homogeneous groups (say b groups) each of the same size t . These groups are called the blocks. The treatments are then randomly
assigned to the experimental units in each block so that there is a different treatment to each experimental unit in each block. The Model for a Randomized Block Experiment is
yij i j ij, #
where i 1, . . . , t; j 1, . . . ,b; y ij represents the observation receiving the i'th treatment in
the j’th block; is the grand mean; i is the effect of the i'th treatment; j is the effect of the j'th
block; and ij is the random error. A randomized block experiment is assumed to be a two-factor experiment. The factors are blocks and treatments. There is one observation per cell. It is assumed that there is no interaction between blocks and treatments. The degrees of freedom for the interaction is used to estimate error. If the treatments are defined in terms of two or more factors, the treatment Sum of Squares can be partitioned into a component due to the Main Effects and a component due to the Interaction of the Treatment Factors. In SPSS, we must implement a custom model in which we do not include the interaction of the treatment and blocking factors. In the resulting ANOVA table we only are concerned with the information presented regarding the treatment factor. The assumptions underlying this test are:
1. each observation is an independent random sample of size one from each of the tbpopulations;
2. each of these tb populations is normally distributed;
3. all tb populations have the same variance (but possibly different means); and
4. the block and treatment effects are additive (ie. there is no interaction effect between the blocking and treatment factors). Violation of this assumption can provide misleading results if the largest mean is more than 50% greater than the smallest mean. As long as the largest mean is less than 50% greater than the smallest mean, the results of this test are still valid. In essence, we assign one factor as a blocking factor when we wish to eliminate the effect of
Module #2: Analysis of Variance
59
this factor in our analysis. For Practice: BFAHS, P.345, q 3.2.4. The nursing supervisor in a local health department wished to study the influence of the time of day on the length of home visits by the nursing staff. It was thought that individual differences among nurses might be large so the supervisor wished to eliminate the effect of the nurse in her analysis. The nursing supervisor collected the following data.
Length of Home Visit by Time of Day
Nurse
Early
Morning
Late
Morning
Early
Afternoon
Late
Afternoon
A 27 28 30 23
B 31 30 27 20
C 35 38 34 30
D 20 18 20 14
Analyze the supervisor’s data at the level of significance. For each nurse, assume that the four different lengths of home visits are independent. Assume all required assumptions are valid. The data can be found in SPSS format at http://bcs.wiley.com/he-bcs/Books?action=index&itemId=0470105828&bcsId=5023. Solution:
Research Question: Does the length of a home visit depend on the time of day after eliminating the effect of the specific nurse? Population Declarations: Let Factor B, the blocking factor, be the specific nurse studied, which we will denote “nurse”. Let Block 1 be Nurse A. Let Block 2 be Nurse B. Let Block 3 be Nurse C. Let Block 4 be Nurse D. Let Factor A be the time of day a home was visited which we will denote “t.o.d.” Let Level 1 of Factor A be the Early Morning visit. Let Level 2 of Factor A be the Late Morning visit. Let Level 3 of Factor A be the Early Afternoon visit. Let Level 4 of Factor A be the Late Afternoon visit. Let Population 1 be the lengths of Nurse A’s early morning visits. Let Population 2 be the lengths of Nurse A’s late morning visits. Let Population 3 be the lengths of Nurse A’s early afternoon visits. Let Population 4 be the lengths of Nurse A’s late afternoon visits. Let Population 5 be the lengths of Nurse B’s early morning visits. Let Population 6 be the lengths of Nurse B’s late morning visits. Let Population 7 be the lengths of Nurse B’s early afternoon visits. Let Population 8 be the lengths of Nurse B’s late afternoon visits. Let Population 9 be the lengths of Nurse C’s early morning visits.
0.05
Module #2: Analysis of Variance
60
Let Population 10 be the lengths of Nurse C’s late morning visits. Let Population 11 be the lengths of Nurse C’s early afternoon visits. Let Population 12 be the lengths of Nurse C’s late afternoon visits. Let Population 13 be the lengths of Nurse D’s early morning visits. Let Population 14 be the lengths of Nurse D’s late morning visits. Let Population 15 be the lengths of Nurse D’s early afternoon visits. Let Population 16 be the lengths of Nurse D’s late afternoon visits. Hypothesis to be tested: H0,t.o.d.: there are no differences among the true mean lengths of home visits based on the time
of day. HA,t.o.d.: there is a difference between at least two of the true mean lengths of home visits based
on the time of day. Hypothesis Test to be used: Two-way ANOVA for a random block design Assumptions required to implement the hypothesis test: 1) Each observation is an independent random sample of size one from each of the 16
populations. 2) Each of these 16 populations is normally distributed. 3) All 16 populations have the same variance. 4) The blocking factor (i.e. the nurse) and the treatment factor (i.e. the time of day) are
additive. We are told that we can assume all the above hold.
The Significance Level: 0.05 The value of the test statistic and the p-value: From the Tests of Between-Subjects Effects Table below, the required value of the test statistic
Ft.o.d(3,9)=11.667 and its associated p-value=0.002.
Tests of Between-Subjects Effects
Dependent Variable:length
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 655.875a 6 109.313 30.684 .000
Intercept 11289.063 1 11289.063 3168.860 .000
t.o.d. 124.688 3 41.563 11.667 .002
nurse 531.188 3 177.063 49.702 .000
Error 32.063 9 3.563
Total 11977.000 16
Corrected Total 687.938 15
Module #2: Analysis of Variance
61
Tests of Between-Subjects Effects
Dependent Variable:length
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 655.875a 6 109.313 30.684 .000
Intercept 11289.063 1 11289.063 3168.860 .000
t.o.d. 124.688 3 41.563 11.667 .002
nurse 531.188 3 177.063 49.702 .000
Error 32.063 9 3.563
Total 11977.000 16
Corrected Total 687.938 15
a. R Squared = .953 (Adjusted R Squared = .922)
The Statistical Decision: Since ,002.005.0 valuep we reject H0.
Conclusion: At the 0.05 level of significance, we have evidence to conclude that, based on the time of day of a visit, there is a difference between at least two of the true mean lengths of time of a home-visit.
■ For Discussion: Based on our conclusion in the above example, we would need to implement
a post-hoc analysis to provide a meaningful answer for the example’s research question. (1) What post-hoc analysis should we perform?
■
(2) Based on this post-hoc analysis, what would be an appropriate answer for the research question?
■ For Discussion: In the above example, we were told to assume that the relevant populations were normally distributed. In practice, how would we test the requisite normality assumptions? Solution: For Populations 1 through 16, we would need to individually test the hypothesis
H0,i: Population i is normally distributed. against
HA,i: Population i is not normally distributed. where i iterates through the set of integers {1, 2, …, 16}. Because we have only sampled one data point from each of the sixteen populations (i.e. one data point per nurse/time of day combination), we are unable to test whether each of the populations from which we have sampled is normally distributed. The best we can test is whether the associated marginal distributions are normally distributed.
Module #2: Analysis of Variance
62
Note that all the marginal distributions being normally distributed does not imply that the underlying joint distributions are also normally distributed. However we can say that, if we have evidence that a marginal distribution is not normally distributed, then we have evidence that the underlying joint distributions are also not normally distributed. The resulting eight sets of hypotheses that we would need to individually test are: 1) H0,NurseA: The distribution of the possible lengths of Nurse A’s home visits (independent of
the time of day) is normal. HA,NurseA: The distribution of the possible lengths of Nurse A’s home visits (independent of the time of day) is not normal.
2) H0,NurseB: The distribution of the possible lengths of Nurse B’s home visits (independent of the time of day) is normal. HA,NurseB: The distribution of the possible lengths of Nurse B’s home visits (independent of the time of day) is not normal.
3) H0,NurseC: The distribution of the possible lengths of Nurse C’s home visits (independent of the time of day) is normal. HA,NurseC: The distribution of the possible lengths of Nurse C’s home visits (independent of the time of day) is not normal.
4) H0,NurseD: The distribution of the possible lengths of Nurse D’s home visits (independent of the time of day) is normal. HA,NurseD: The distribution of the possible lengths of Nurse D’s home visits (independent of the time of day) is not normal.
5) H0,EarlyMorning: The distribution of the possible lengths of early morning home visits (independent of the nurse) is normal. HA,EarlyMorning: The distribution of the possible lengths of early morning home visits (independent of the nurse) is not normal.
6) H0,LateMorning: The distribution of the possible lengths of late morning home visits (independent of the nurse) is normal. HA,LateMorning: The distribution of the possible lengths of late morning home visits (independent of the nurse) is not normal.
7) H0,EarlyAfternoon: The distribution of the possible lengths of early afternoon home visits (independent of the nurse) is normal. HA,EarlyAfternoon: The distribution of the possible lengths of early afternoon home visits (independent of the nurse) is not normal.
8) H0,LateAfternoon: The distribution of the possible lengths of late afternoon home visits (independent of the nurse) is normal. HA, LateAfternoon: The distribution of the possible lengths of late afternoon home visits (independent of the nurse) is not normal.
The Tests of Normality table below contains the values of the Shapiro-Wilk Test Statistic and its
associated p-values required to test the hypotheses sets labelled 1 through 4.
Tests of Normality
nurse
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
length A .250 4 . .953 4 .734
Module #2: Analysis of Variance
63
B .250 4 . .878 4 .329
C .220 4 . .980 4 .900
D .260 4 . .827 4 .161
a. Lilliefors Significance Correction
At the α = 0.05 level of significance, there is no evidence to reject the assumption that the distributions of the possible lengths of home visits for Nurse A (p-value =0.734), Nurse B (p-value =0.329), Nurse C (p-value=0.900), and Nurse D (p-value=0.161) are respectively normal.
The Tests of Normality table below contains the values of the Shapiro-Wilk Test Statistic and its
associated p-values required to test the hypotheses sets labelled 5 through 8.
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
early_morning .173 4 . .981 4 .909
late_morning .226 4 . .976 4 .880
early_afternoon .200 4 . .978 4 .889
late_afternoon .175 4 . .995 4 .983
a. Lilliefors Significance Correction
At the α = 0.05 level of significance, there is no evidence to reject the assumption that the distributions of the possible lengths of home visits for the early morning shift (p-value =0.909), the late morning shift (p-value =0.880), the early afternoon shift (p-value=0.889), and the late afternoon shift (p-value=0.983) are respectively normal.
Because, at the α = 0.05 level of significance, we do not have evidence to conclude that any of the marginal distributions are not normally distributed, we do not have evidence to reject the assumptions that each of the original sixteen populations is normally distributed.
■ For Discussion: In the above example, we were told to assume that the data was sampled from populations with the same variance. In practice, how would we test this “equality of variances” assumption? Solution: Due to the insufficient sample size for each nurse/time of day combination, we cannot directly test whether the data was sampled from populations with the same variance. The best we can do is test whether the marginal variances for the nurses equal and independently the marginal variances for the time of the day equal.
Module #2: Analysis of Variance
64
Suppose we wish to determine whether the variances of the lengths of visits associated with
each of the Nurses (independent of the time of day) are equal. The table for the corresponding
test is given below.
Test of Homogeneity of Variances
Length
Levene Statistic df1 df2 Sig.
.446 3 12 .725
Because p-value=0.725>0.05= α, we do not have evidence to reject the assumption that the
variances of the lengths of visits associated with each Nurse (independent of the time of day)
are equal. Suppose we wish to determine whether the variances of the lengths of visits associated with
each of the times of day (independent of the nurse) are equal. The table for the corresponding
test is given below.
Test of Homogeneity of Variances
Length
Levene Statistic df1 df2 Sig.
.067 3 12 .976
Because p-value=0.976>0.05= α, we do not have evidence to reject the assumption that the
variances of the lengths of visits associated with each of the times of day (independent of the
nurse) are equal. Because we do not have evidence to reject the assumptions that the marginal variances for the lengths of visits associated with each of the times of day (independent of the nurse) and independently the marginal variances for the variances of the lengths of visits associated with each Nurse (independent of the time of day) are equal, we do not have evidence to reject the assumption that the data was drawn from populations with the same variances.
■ For Discussion: In the above example, we were told to assume that the blocking factor (i.e. the nurse) and the treatment factor (i.e. the time of day) are additive. In practice, how would we test this “additivity” assumption? Solution:
Module #2: Analysis of Variance
65
We can use Tukey’s Test for Non-Additivity. Note that this test can only be used if there is a single observation from each population. If there is more than one observation from each population, you can implement the test for an interaction effect that we talked about in the previous section. For Tukey’s Test for Non-Additivity, the null hypothesis is H0: the two factors are additive. The alternative hypothesis is HA: the two factors are non-additive. To test at the 5% level of significance whether the Nurse and Time of Day factors in the previous example are additive, we need to generate the ANOVA with Friedman’s Test and Tukey’s Test for Non-additivity table using SPSS. The hypothesis we are testing is: H0: the Nurse and Time of Day factors are additive. HA: The Nurse and Time of Day factors are non-additive.
From the ANOVA table below, since p-value=0.777>0.05=, we do not have evidence to reject
H0. Hence, at the 5% level of significance, we do not have evidence to reject the assumption
that the Nurse and Time of Day factors are additive (p-value=0.777).
ANOVA with Friedman's Test and Tukey's Test for Nonadditivity
Sum of Squares df Mean Square
Friedman's Chi-
Square Sig
Between People 531.188 3 177.063
Within People Between Items 124.688 3 41.563 9.545 .023
Residual Nonadditivity .340a 1 .340 .086 .777
Balance 31.722 8 3.965
Total 32.063 9 3.563
Total 156.750 12 13.063
Total 687.938 15 45.863
Grand Mean = 26.5625
a. Tukey's estimate of power to which observations must be raised to achieve additivity = .759.
■ Suppose after completing the above analysis, we found that the additivity assumption was violated. This does not necessarily immediately imply that the results of our Two-way ANOVA for a random block design are invalid. To determine if violation of the additivity assumption will lead to misleading results, we compute
the means for each block (time of day) and for each treatment (nurse). These means are
presented in the following two tables: the first presents the mean length of visit for each nurse
and the second presents the mean length of visit for each time of day.
Descriptive Statistics
Module #2: Analysis of Variance
66
Dependent Variable:length
nurse Mean Std. Deviation N
A 27.00000000 2.943920289 4
B 27.00000000 4.966554809 4
C 34.25000000 3.304037934 4
D 18.00000000 2.828427125 4
Total 26.56250000 6.772185762 16
Descriptive Statistics
Dependent Variable:length
tod Mean Std. Deviation N
EA 27.75000000 5.909032634 4
EM 28.25000000 6.396613687 4
LA 21.75000000 6.652067348 4
LM 28.50000000 8.225975120 4
Total 26.56250000 6.772185762 16
Note, in the above two tables, that the smallest mean is 18.000 and the largest mean is 34.25.
Since 34.25>18+18/2=27, we would conclude that a violation of the additivity assumption would
lead to misleading results. Consequently we would have to analyze this data using some other
technique.
■ Now Your Turn: A study is made to determine the impact of the humidity level on the growth of different molds. Three species of mold commonly found in homes were grown under four assigned humidity levels. The percentages of the surface area covered by mold one week after inoculation have been recorded in the table below.
Mold
Humidity
Average 30% 50% 70% 90%
A 39.0 33.1 33.8 33.0 34.7 B 36.9 27.2 29.7 28.5 30.6 C 27.4 29.2 26.7 30.9 28.6
Average 34.4 29.8 30.1 30.8 31.3
Module #2: Analysis of Variance
67
(1) At the 5% level of significance, determine the average effect the humidity level has on the percentage of a container’s surface area covered by mold, controlling for the type of mold. You may assume all required assumptions hold.
(2) W were told to assume that the relevant populations were normally distributed. In practice, how would we test the requisite normality assumptions?
(3) We were told to assume that the data was sampled from populations with the same variance. In practice, how would we test this “equality of variances” assumption?
(4) We were told to assume that the blocking factor (i.e. the type of mold) and the treatment factor (i.e. the humidity level) are additive. In practice, how would we test this “additivity” assumption?
(5) Suppose after completing the above analysis in (4), we found that the additivity assumption was violated. This does not necessarily immediately imply that the results of our Two-way ANOVA for a random block design are invalid.
What happens if the observations are somehow correlated. All is not lost. If, on independent objects, we take different measurements on each object as the objects are exposed to different conditions, then we can analyze the data using the technique in the next section.
Module #2: Analysis of Variance
68
One-Way ANOVA F -Test for Dependent Samples (Repeated Measures) In a Repeated Measures Design, we have experimental units that may be grouped according to one or several factors (ie the grouping factors). Then, on each experimental unit, we have several measurements (the repeated measures) not just a single measurement. The repeated measures may be taken at combinations of levels for one or several factors (the repeated measures factors). The assumptions for the one-way repeated measures design we will use are: 1. the subjects are a simple random sample; 2. each observation is an independent simple random sample of size one from tn (t is the
number of treatments and n is the number of subjects) normal populations; 3. the tn populations have the same variance;
4. the t treatments are fixed; 5. there is no interaction between the treatments and the subjects; and 6. there is a correlation among the repeated measures and these correlations are all equal. Note (3) and (6) combined is referred to as sphericity. A set of populations satisfying (3) and (6) is said to be spherical. To test the sphericity assumption, we will use Mauchly’s Test for Sphericity. For this test, the null hypothesis is H0: the tn populations are spherical, and the alternative hypothesis is HA: the tn populations are not spherical. For the results of Mauchly’s Test for Sphericity to be valid, each of the tn populations must be normally distributed. Consequently one should check that each of the populations is normally distributed prior to implementing Mauchly’s Test for Sphericity. For Practice: An experimenter was interested in how the level of a certain enzyme changed in 15 randomly selected cardiac patients after open heart surgery. For each patient, the enzyme was measured immediately after surgery (Day 0); one day after surgery (Day 1); two days after surgery (Day 2); and one week after surgery (Day 7). The data is summarized in the below table.
Subject Day 0 Day 1 Day 2 Day 7 Subject Day 0 Day 1 Day 2 Day 7
1 108 63 45 42 9 106 65 49 49
2 112 75 56 52 10 110 70 46 47
3 114 75 51 46 11 120 85 60 62
4 129 87 69 69 12 118 78 51 56
5 115 71 52 54 13 110 65 46 47
6 122 80 68 68 14 132 92 73 63
7 105 71 52 54 15 127 90 73 68
8 117 77 54 61
At the 5% level of significance, determine if the underlying populations are spherical. Assume the necessary populations are normally distributed. Solution:
Module #2: Analysis of Variance
69
The population of interest is the set of cardiac patients who have open-heart surgery. On this
population, we take four measurements. Let population 1 be the possible enzyme levels of the
cardiac patients immediately after their open-heart surgery, population 2 be the possible
enzyme levels of the cardiac patients 24 hours after their Day 0 enzyme level was measured,
population 3 be the possible enzyme levels of the cardiac patients 48 hours after their Day 0
enzyme level was measured, and population 4 be the possible enzyme levels of the cardiac
patients 7 days after their Day 0 enzyme level was measured. Then hypothesis to be tested is
H0: The set of the four populations is spherical.
Ha: The set of the four populations is not spherical.
From the Mauchly’s Test of Sphericity table below, since the p-value=0.687>0.05=α, we do not
reject H0. Consequently, at the 5% level of significance, we do not have enough evidence to
reject that the four populations are spherical (p-value=0.687). Hence we do not have evidence
to conclude that the sphericity assumption does not hold.
Mauchly's Test of Sphericityb
Measure:MEASURE_1
Within Subjects
Effect
Mauchly's
W
Approx. Chi-
Square df Sig.
Epsilona
Greenhouse-
Geisser
Huynh-
Feldt
Lower-
bound
days .784 3.089 5 .687 .863 1.000 .333
■
Now that we know how to test the sphericity assumption associated with a One –Way ANOVA for Repeated Measures test, we will demonstrate an analysis of Repeated Measures data. For Practice: An experimenter was interested in how the level of a certain enzyme changed in 15 randomly selected cardiac patients after open heart surgery. For each patient, the enzyme was measured immediately after surgery (Day 0); one day after surgery (Day 1); two days after surgery (Day 2); and one week after surgery (Day 7). The data is summarized in the below table.
Subject Day 0 Day 1 Day 2 Day 7 Subject Day 0 Day 1 Day 2 Day 7
1 108 63 45 42 9 106 65 49 49
2 112 75 56 52 10 110 70 46 47
Module #2: Analysis of Variance
70
3 114 75 51 46 11 120 85 60 62
4 129 87 69 69 12 118 78 51 56
5 115 71 52 54 13 110 65 46 47
6 122 80 68 68 14 132 92 73 63
7 105 71 52 54 15 127 90 73 68
8 117 77 54 61
At the 5% level of significance, analyze the above data. Assume all the assumptions required to implement the hypothesis test are true. Solution: Research Question:
Is there a difference in the true mean enzyme levels of subjects based on the amount of time
has passed after open-heart surgery?
Population Declarations:
The population of interest is the set of cardiac patients who have open-heart surgery. Let Day0
be the true mean enzyme level of the cardiac patients immediately after their open-heart
surgery, Day1 be the true mean enzyme level of the cardiac patients 24 hours after their Day 0
enzyme level was measured, Day2 be the true mean enzyme level of the cardiac patients 48
hours after their Day 0 enzyme level was measured, and Day7 be the true mean enzyme level of
the cardiac patients 7 days after their Day 0 enzyme level was measured.
Hypothesis to be tested:
H0: The true mean enzyme levels are equal based on the amount of time passed after open-
heart surgery. (i.e. µday 0 = µday 1= µday 2= µday 7)
HA: At least two of the true mean enzyme levels based on the amount of time passed after
open-heart surgery differ.
Hypothesis Test to be used: One –Way ANOVA for Repeated Measures.
Assumptions required to implement the hypothesis test:
1. the subjects are a simple random sample;
2. each observation is an independent simple random sample of size one from tn (t is the
number of treatments and n is the number of subjects) normal populations;
3. the tn populations have the same variance;
Module #2: Analysis of Variance
71
4. the t treatments are fixed;
5. there is no interaction between the treatments and the subjects;
6. there is a correlation among the repeated measures and these correlations are all equal.
We are told to assume that all the above assumptions are true.
The Significance Level: α=0.05
The Test Statistic and corresponding p-value:
Tests of Within-Subjects Effects
Measure:MEASURE_1
Source
Type III Sum of
Squares df Mean Square F Sig.
days Sphericity Assumed 36282.267 3 12094.089 1301.662 .000
Greenhouse-Geisser 36282.267 2.588 14021.994 1301.662 .000
Huynh-Feldt 36282.267 3.000 12094.089 1301.662 .000
Lower-bound 36282.267 1.000 36282.267 1301.662 .000
Error(days) Sphericity Assumed 390.233 42 9.291
Greenhouse-Geisser 390.233 36.225 10.772
Huynh-Feldt 390.233 42.000 9.291
Lower-bound 390.233 14.000 27.874
From the Sphericity Assumed row in the above Tests of Within-Subjects Effects table, the test
statistic is F(3.42)=1301.662 with an associated p-value < 0.001.
The Decision Rule: Since the p-value <0.001 <0.05=α, we reject H0 , i.e. we reject µday 0 = µday 1=
µday 2= µday 7.
Conclusion: At the 5% level of significance, assuming that all assumptions to implement the
One-Way ANOVA for the Repeated Measures hold, there is evidence to conclude that at least
two of the true mean level enzymes based on the amount of time that has passed after open-
heart surgery are different (p-value <0.001).
■
Module #2: Analysis of Variance
72
For Discussion: In the above example, because we rejected the null hypothesis, we need to
conduct a post-hoc analysis to determine which means actually differed. What post-hoc
analysis would we complete?
Answer:
To determine which true mean enzyme levels pairwise differ, we need to implement several
Paired–Sample t-tests. The following Pairwise Comparisons table contains the test statistics
and corresponding p-values for the six paired-sample t-tests.
Pairwise Comparisons
Measure:MEASURE_1
(I) days (J) days
Mean Difference
(I-J) Std. Error Sig.a
95% Confidence Interval for
Differencea
Lower Bound Upper Bound
0 1 40.067* .859 .000 38.224 41.909
2 60.000* 1.121 .000 57.595 62.405
7 60.467* 1.287 .000 57.707 63.227
1 0 -40.067* .859 .000 -41.909 -38.224
2 19.933* 1.016 .000 17.753 22.113
7 20.400* 1.230 .000 17.762 23.038
2 0 -60.000* 1.121 .000 -62.405 -57.595
1 -19.933* 1.016 .000 -22.113 -17.753
7 .467 1.112 .681 -1.919 2.852
7 0 -60.467* 1.287 .000 -63.227 -57.707
1 -20.400* 1.230 .000 -23.038 -17.762
2 -.467 1.112 .681 -2.852 1.919
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
a. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).
Conclusion: At the 5% level of significance there is evidence to conclude that the true mean
enzyme levels measured on day 0 and day 1 (p-value < 0.001), day 0 and day 2 (p-value <
0.001), day 0 and day 7(p-value < 0.001), day 1 and day 2 (p-value < 0.001), and day 1 and
day 7 (p-value < 0.001) differ. At the 5% level of significance, there is no evidence to conclude
that the true mean enzyme levels measured on day 2 and day 7 (p-value =0.681) differ.
Module #2: Analysis of Variance
73
NOTE: When the populations are spherical, we can use a confidence interval plot to visualize
how the true means might differ. If the populations are not spherical, then a confidence interval
plot does not necessarily represent how the true means may differ.
Because we cannot to reject that the four populations from which our enzyme levels were
sampled are spherical, we will include below a 95% confidence interval plot illustrating our
estimated 95% confidence intervals for µday0 , µday1, µday2, and µday7.
Based on the 95% confidence interval plot above for the mean enzyme levels, we can see that
the 95% confidence intervals for the mean enzyme levels for day 2 and day 7 overlap.
Therefore, it’s reasonable to conclude that there is no difference between mean enzyme levels
for day 2 and day 7. Because there is the 95% confidence intervals for the true mean enzyme
levels for day 0 and day 1 do not overlap, we conclude that these two means differ. Further,
because both the 95% confidence intervals for the true mean enzyme levels for day 0 and day 1
do not overlap neither of the 95% confidence intervals for the true mean enzyme levels for day 2
and day 7, we conclude that both the true mean enzyme levels for day 0 and day 1 differ from
the true mean enzyme levels for both day 2 and day 7.
Module #2: Analysis of Variance
74
The upshot of the above analysis is there is a statistically significant reduction in the true mean
enzyme levels from day 0 to day 1 and day 1 to day 2 and there is no statistically significant
change in the true mean enzyme levels from day 2 to day 7.
■ For Discussion: In the above post open-heart surgery enzyme level example, one of the
assumptions we need to verify was that each observation was drawn from a normally distributed
population. Is this assumption reasonable?
Answer:
Because there is only one observation from each patient/time combination, we cannot directly
test whether the observations from the population associated with each patient/time
combination are normally distributed. The best we can test is whether the associated marginal
distributions are normally distributed.
To test whether the enzyme-level measurements taken on each of Day 0, Day 1, Day 2, and
Day 7 are normally distributed, we refer to the p-values (based on the Shapiro-Wilk test for
normality) in the following Tests of Normality table. At the 5% level of significance, there is no
evidence to reject the assumptions that the enzyme-level measurements taken respectively on
Day 0 (p-value=0.537), Day 1 (p-value=0.566), and Day 7 (p-value=0.339) are normally
distributed. At the same level of significance, there is evidence to conclude that the enzyme-
level measurements taken on Day 2 are not normally distributed (p-value=0.033).
Tests of Normality
Kolmogorov-Smirnov
a Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Day0 .109 15 .200* .951 15 .537
Day1 .117 15 .200* .953 15 .566
Day2 .203 15 .096 .869 15 .033
Day7 .119 15 .200* .936 15 .339
a. Lilliefors Significance Correction
*. This is a lower bound of the true significance.
To test whether the enzyme-level measurements taken for a specific patient are normally distributed, we refer to the p-values (based on the Shapiro-Wilk test for normality) in the following Tests of Normality table. At the 5% level of significance, there is no evidence to reject
Module #2: Analysis of Variance
75
the assumptions that the enzyme-level measurements for each of the fifteen patients respectively are normally distributed (p-value=0.240 for each patient).
Tests of Normality
Patient Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Day
d
i
m
e
n
s
i
o
n
1
1.00 .314 4 . .854 4 .240
2.00 .314 4 . .854 4 .240
3.00 .314 4 . .854 4 .240
4.00 .314 4 . .854 4 .240
5.00 .314 4 . .854 4 .240
6.00 .314 4 . .854 4 .240
7.00 .314 4 . .854 4 .240
8.00 .314 4 . .854 4 .240
9.00 .314 4 . .854 4 .240
10.00 .314 4 . .854 4 .240
11.00 .314 4 . .854 4 .240
12.00 .314 4 . .854 4 .240
13.00 .314 4 . .854 4 .240
14.00 .314 4 . .854 4 .240
15.00 .314 4 . .854 4 .240
a. Lilliefors Significance Correction
Because there was evidence that one of the marginal distributions was not normally distributed,
technically the “normality assumption” underlying our initial repeated-measures ANOVA test has
been violated.
■
For Discussion: In the above post open-heart surgery enzyme level example above, one of
the assumptions we needed to verify was that the patients and the times for the enzyme-level
measurement are additive. Are the patient and times for the enzyme-level measurement really
additive factors?
Answer:
Module #2: Analysis of Variance
76
Referring to the below ANOVA with Tukey’s Test for Nonadditivity table, since the p-
value=0.345>0.05= we do not have evidence to reject the assumption that the patient and time for the enzyme-level measurement factors are additive.
ANOVA with Tukey's Test for Nonadditivity
Sum of Squares df Mean Square F Sig
Between People 4221.100 14 301.507
Within People Between Items 36282.267 3 12094.089 1301.662 .000
Residual Nonadditivity 8.482a 1 8.482 .911 .345
Balance 381.751 41 9.311
Total 390.233 42 9.291
Total 36672.500 45 814.944
Total 40893.600 59 693.112
Grand Mean = 76.2000
a. Tukey's estimate of power to which observations must be raised to achieve additivity = 1.139.
■
Note in the above example, the subjects are not grouped, ie. there is only one group. There is one repeated measures factor, ie the Time, with four levels (Day 0, 1, 2, and 7). Now Your Turn: Starch et al. (A-17) wanted to show the effectiveness of a central four-
quadrant sleeve and screw in anterior cruciate ligament reconstruction. The researchers performed a series of reconstructions on eight randomly selected cadaveric knees. The loads (in newtons) required to achieve different graft laxities (mm) for seven specimens (data not available for one specimen) using five different load weights were collected. The Graft laxities of Loads A through E were consecutively measured. Graft laxity is the separation (in mm) of the femur and the tibia at the points of graft fixation. 1. Is there sufficient evidence to conclude that different loads are required to produce
different levels of graft laxity? Refer to Exercise 8.4.2 (pg 352) in the textbook for the data
for this question. Work at the α=0.05 level of significance. You may assume that all the
assumptions required to implement the analysis hold.
2. One of the assumptions we need to verify was that each observation was drawn from a
normally distributed population. Is this assumption reasonable?
3. One of the assumptions we needed to verify was that the five graft laxity populations are
spherical. Are the five graft laxity populations spherical?
4. One of the assumptions we needed to verify was that the knee and load factors are
additive. Are the knee and load factors really additive factors?
Module #2: Analysis of Variance
77
Random Effects Factor Suppose the levels of a factor have been selected at random from a population of levels. Then the factor is referred to as a random effects factor. The conclusions of the analysis will be directed at the population of levels, not just the levels selected for the experiment. The model for one fixed-effects factor and one random-effects factor is
yijk i j ij ijk #
where and, for i 1, . . . ,a,i are fixed unknown constants and ijk is a random, normally
distributed variable with mean 0 and variance 2 ; for jnj ,,...,1 is normally distributed with
mean 0 and variance B2 ; and for i 1, . . . ,a and for j 1, . . . ,n , bij is normally distributed
with mean 0 and variance AB2 . Note
i1
a
i 0. #
The assumptions to implement a random-effects Two-Way ANOVA are the same as those for the fixed-effects Two-Way ANOVA with the additional assumption that the levels for each random-effects factor were randomly selected. You also need at least two observations for every block-treatment combination. For Practice: In a study of the length of time spent on individual home visits by public health nurses, data were reported on length of home visit, in minutes, by a sample of 80 nurses (five nurses were randomly selected from each age/type of patient combination). The ages of the nurses were subdivided into four categories: 20-29, 30-39, 40-49, and 50+. Of all the different types of patients, the four types that were randomly selected were cardiac, cancer, c.v.a., and tuberculosis. The supervisor wants to know if one’s age causes a different length of time to be spent on individual home visits for an arbitrary patient type. Assuming the assumptions required
to analyze the data found in Table 8.5.5 in BFAHS, P. 360 hold, with 0.05, analyze the above scenario. Solution:
Note: By now, you should be able to verify the assumptions for this hypothesis test. Consequently their verification is not shown. Research Question: Does one’s age cause a different length of time to be spent on individual home visits for an arbitrary patient type? Let Factor A be the Type of Patient. Let Level 1 of Factor A be cardiac. Let Level 2 of Factor A be cancer. Let Level 3 of Factor A be c.v.a.. Let Level 4 of Factor A be tuberculosis. Let Factor B be the age group of a nurse. Let Level 1 of Factor B be 20-29 years of age. Let Level 2 of Factor B be 30-39 years of age. Let Level 3 of Factor B be 40-49 years of age.
Module #2: Analysis of Variance
78
Let Level 4 of Factor B be 50 years of age or older. Let Population 1 be the set of nurses who are between 20 and 29 years of age and attend to cardiac patients. Let Population 2 be the set of nurses who are between 30 and 39 years of age and attend to cardiac patients. Let Population 3 be the set of nurses who are between 40 and 49 years of age and attend to cardiac patients. Let Population 4 be the set of nurses who are between 50 years of age or older and attend to cardiac patients. Let Population 5 be the set of nurses who are between 20 and 29 years of age and attend to cancer patients. Let Population 6 be the set of nurses who are between 30 and 39 years of age and attend to cancer patients. Let Population 7 be the set of nurses who are between 40 and 49 years of age and attend to cancer patients. Let Population 8 be the set of nurses who are between 50 years of age or older and attend to cancer patients. Let Population 9 be the set of nurses who are between 20 and 29 years of age and attend to c.v.a. patients. Let Population 10 be the set of nurses who are between 30 and 39 years of age and attend to c.v.a. patients. Let Population 11 be the set of nurses who are between 40 and 49 years of age and attend to c.v.a. patients. Let Population 12 be the set of nurses who are between 50 years of age or older and attend to c.v.a. patients. Let Population 13 be the set of nurses who are between 20 and 29 years of age and attend to tuberculosis patients. Let Population 14 be the set of nurses who are between 30 and 39 years of age and attend to tuberculosis patients. Let Population 15 be the set of nurses who are between 40 and 49 years of age and attend to tuberculosis patients. Let Population 16 be the set of nurses who are between 50 years of age or older and attend to tuberculosis patients. Hypothesis Test to be used: Because each of the patient types was randomly selected from all available patient types, we are going to use a one fixed-effects factor and one random-effects factor Two-Way ANOVA. Hypotheses to be tested: H0,Age: the true mean lengths of home visit times for each of the age categories are all equal. HA,Age: at least two of the true mean lengths of home visit times for each of the age categories differ. H0,Patient Type: the true mean lengths of home visit times for each patient type are all equal. HA,Patient Type: at least two of the true mean lengths of home visit times for each patient type differ. H0,Age x Patient Type: there is no interaction effect between one’s age category and the patient type attended on the true mean lengths of home visit times. HA,Age x Patient Type: there is an interaction effect between one’s age category and the patient type attended on the true mean lengths of home visit times..
Module #2: Analysis of Variance
79
Hypothesis Test to be used: Two-Way ANOVA with one fixed and one random effects factor
Assumptions required to implement the hypothesis test: 1) The populations from which each of the random samples was taken must be normal. 2) The populations must have the same variances. 3) The samples must be independent of one another. 4) The groups must be equal in sample size. We are told to assume all the assumptions hold.
The Significance Level: 0.05 The Test Statistic and corresponding p-value: The following table contains the values of the test-statistics and the corresponding p-values that are required to test each of our three sets of hypotheses.
Tests of Between-Subjects Effects
Dependent Variable:Time
Source Type III Sum of
Squares df Mean Square F Sig.
Intercept Hypothesis 82818.450 1 82818.450 206.865 .001
Error 1201.050 3 400.350
Age Hypothesis 1201.050 3 400.350 5.922 .016
Error 608.450 9 67.606b
PatientType Hypothesis 2992.450 3 997.483 14.754 .001
Error 608.450 9 67.606b
Age * PatientType Hypothesis 608.450 9 67.606 4.605 .000
Error 939.600 64 14.681c
a. MS(PatientType)
b. MS(Age * PatientType)
c. MS(Error)
The Decision Rule:
Since, for the interaction term, ,001.005.0 valuep we reject H0,Age x Patient Type.
Since, for the main effect associated with the age category, ,016.005.0 valuep we
reject H0,Age.
Since, for the main effect associated with the type of patient, ,001.005.0 valuep we
reject H0,Patient Type.
Conclusion: If all the assumptions required to implement the analysis are valid, at the 05.0
level of significance, we have evidence to conclude that at least two of the true mean lengths of
Module #2: Analysis of Variance
80
home visit times based on the four different age categories differ (p-value=0.016) and we have evidence to conclude that at least two of the true mean lengths of home visit times based on the type of patient selected differ (p-value=0.001). At the same level of significance, we also have evidence to conclude that there is an interaction effect between a nurse’s age category and the type of patient selected on the true mean lengths of home visit times (p-value < 0.001). NOTE: You would now complete a post-hoc analysis to explore how the mean lengths of home visit times differ based on the age category and the type of patient and how the these factors interact to influence the mean lengths of home visit times. For the sake of brevity, we do not present this post-hoc analysis.
■
Now Your Turn: A health district owns 36 identical make/model ambulances. The district CEO
is interested in comparing the effects of three brands of tires (A, B and C) on mileage (mpg).
The district installs each brand on 12 of its ambulances, i.e. twelve of which have tire brand A
installed, twelve of which have tire brand B installed, and the remaining twelve have tire brand C
installed. The CEO realizes that, in addition to the tire brand, the driver will also affect the
mileage. Consequently the CEO randomly selects 4 drivers from its collection of drivers and
randomly assigns the drivers to the ambulances in such a manner that each driver drives three
ambulances with each tire brand. The resulting mileages are summarized below:
Driver Tire
Brand Mileage Driver Tire
Brand Mileage
1 A 39.6 3 A 33.9
1 A 38.6 3 A 43.2
1 A 41.9 3 A 41.3
1 B 18.1 3 B 17.8
1 B 20.4 3 B 21.3
1 B 19.0 3 B 22.3
1 C 31.1 3 C 31.3
1 C 29.8 3 C 28.7
1 C 26.6 3 C 29.7
2 A 38.1 4 A 36.9
2 A 35.4 4 A 30.3
2 A 38.8 4 A 35.0
2 B 18.2 4 B 17.8
2 B 14.0 4 B 21.2
2 B 15.6 4 B 24.3
2 C 30.2 4 C 27.4
2 C 27.9 4 C 26.6
2 C 27.2 4 C 21.0
Module #2: Analysis of Variance
81
The CEO wishes to generalize its findings regarding the impact of the tire brand on mileage to
all ambulance drivers within the district. At the 5% level of significance, analyze the CEO’s
data. You may assume that all the assumptions for your analysis hold.
Module #2: Analysis of Variance
82
Learning Activities
Discussion Questions
Critical Thinking Questions
Assignments/Activities
Module #2: Analysis of Variance
83
References Cite any references used in the learning material.