Analysis and Interpretation Inferential Statistics ANOVA
Madgerie Jameson, UWI SOE
Slide 2
OUTLINE Definition of Analysis of Variance Logic of ANOVA ( the
Theory behind ANOVA) The F test One way ANOVA Post Hoc Tests
Interpreting the results
Slide 3
Analysis of Variance Suppose the Ministry of Education decides
to test three different methods of teaching Mathematics. After
teachers implemented the different methods for a term, the testing
and measurement unit wanted to know if the mean scores of students
taught with the three different methods are the same. Questions:
What data would they require? How will they test for this equity of
means?
Slide 4
Analysis of Variance Definition The Analysis of Variance
(ANOVA) is statistical model that is used to analyse situations in
which we want to compare more than two conditions. It is used to
test the null hypothesis that the mean of three or more populations
are equal.
Slide 5
Recall opening example The ministry developed three different
methods to teach Mathematics. They want to determine whether the
three methods produce different mean scores. So we test the null
Hypothesis H 0 : 1 = 2 = 3 ( all three population means are equal)
H1 : Not all three population means are equal.
Slide 6
You Ask Is there an overall average difference? Is this
difference statistically significant? If so, is the size of the
difference managerially significant? The three methods M1 M2
M3
Slide 7
You can Test the three hypotheses H 0 : 1 = 2 or Ho: 1 = 3 or H
o : 2 = 3 (using t test) If you reject even one of the three
hypothesis, then you must reject the null Hypothesis H 0 : 1 = 2 =
3 Combining the type ! Error probabilities for the three tests will
give a large type 1 error probability test for H 0 : 1 = 2 = 3 The
procedure that can test the equality of three means in one test is
the ANOVA
Slide 8
One Way ANOVA A procedure to make tests by comparing the means
of several population. In one way ANOVA, we analyse one factor or
variable. Testing the equality of the mean of the Mathematics
scores of students who are taught using the three different
methods. One factor is considered the effect size of the different
teaching methods.
Slide 9
Assumptions of One-Way ANOVA The following assumptions must
hold true to use one- way ANOVA. 1. The populations from which the
samples are drawn are (approximately) normally distributed. 2. The
populations from which the samples are drawn have the same variance
(or standard deviation). 3. The samples drawn from different
populations are random and independent. Prem Mann, Introductory
Statistics, 7/E Copyright 2010 John Wiley & Sons. All right
reserved
Slide 10
Using the example of the three teaching methods we must assume:
The scores of all the students taught by each method are (
approximately) normally distributed. The means of the all three
distributions of scores for the three teaching methods may or may
not be the same, but all three distributions have the same variance
When we take samples from an ANOVA test these samples are drawn
independently and randomly from three different populations.
Slide 11
ANOVA is applied by Calculating two estimates of the variance,
of the population distribution The variance between samples ( mean
square between samples MSB). It gives an estimate of the variance
based on the samples taken from different populations e.g. the
three teaching methods. MSB is based on the values of the mean
scores of the three samples of students taught by the three
different methods.
Slide 12
The variance within samples ( mean square within samples MSW).
It gives an estimate of the variance within the data of different
samples. MSW is based on the scores of individual students included
in the three samples taken from the three population. The concept
of MSW is similar to the concept of the pooled standard deviation,
S p
Slide 13
Note The one-way ANOVA test is always right-tailed with the
rejection region in the right tail of the F distribution
curve.
Slide 14
THE F DISTRIBUTION Definition 1. The F distribution is a
continuous curve skewed to the right. 2. The F distribution has two
numbers of degrees of freedom: df for the numerator and df for the
denominator. 3. The units of an F distribution, denoted F, are
nonnegative. Prem Mann, Introductory Statistics, 7/E Copyright 2010
John Wiley & Sons. All right reserved
Slide 15
For an F distribution, degrees of freedom for the numerator and
degrees of freedom for the denominator are usually written as
follows. Prem Mann, Introductory Statistics, 7/E Copyright 2010
John Wiley & Sons. All right reserved
Slide 16
Three F distribution curves. Prem Mann, Introductory
Statistics, 7/E Copyright 2010 John Wiley & Sons. All right
reserved
Slide 17
Exercise Find the F value for 8 degrees of freedom for the
numerator, 14 degrees of freedom for the denominator, and.05 area
in the right tail of the F distribution curve. Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 18
Obtaining the F Value using the statistical table Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 19
The critical value of F for 8 df for the numerator, 14 df for
the denominator, and.05 area in the right tail. Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 20
Calculating the Value of the Test Statistic Test Statistic F
for a One-Way ANOVA Test The value of the test statistic F for an
ANOVA test is calculated as Prem Mann, Introductory Statistics, 7/E
Copyright 2010 John Wiley & Sons. All right reserved
Slide 21
Example Fifteen form one students were randomly assigned to
three groups to experiment with three different methods of teaching
Mathematics. At the end of the term, the same test was given to all
15 students. The table gives the scores of students in the three
groups. Prem Mann, Introductory Statistics, 7/E Copyright 2010 John
Wiley & Sons. All right reserved
Slide 22
Calculate the value of the test statistic F. Assume that all
the required assumptions for ANOVA are assumed to hold true.
Slide 23
Solution Let x = the score of a student k = the number of
different samples (or treatments) n i = the size of sample i T i =
the sum of the values in sample i n = the number of values in all
samples = n 1 + n 2 + n 3 +... x = the sum of the values in all
samples = T 1 + T 2 + T 3 +... x = the sum of the squares of the
values in all samples Prem Mann, Introductory Statistics, 7/E
Copyright 2010 John Wiley & Sons. All right reserved
Slide 24
Calculate MSB and MSW To calculate MSB and MSW, we first
compute the between-samples sum of squares, denoted by SSB and the
within-samples sum of squares, denoted by SSW. The sum of SSB and
SSW is called the total sum of squares and is denoted by SST; that
is, SST = SSB + SSW The values of SSB and SSW are calculated using
the following formulas. Prem Mann, Introductory Statistics, 7/E
Copyright 2010 John Wiley & Sons. All right reserved
Slide 25
Between- and Within-Samples Sums of Squares The between-samples
sum of squares, denoted by SSB, is calculates as Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 26
Between- and Within-Samples Sums of Squares The within-samples
sum of squares, denoted by SSW, is calculated as Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 27
Let us return to the example Prem Mann, Introductory
Statistics, 7/E Copyright 2010 John Wiley & Sons. All right
reserved
Slide 28
x = T 1 + T 2 + T 3 = 324+369+388 = 1081 n = n 1 + n 2 + n 3 =
5+5+5 = 15 x = (48) + (73) + (51) + (65) + (87) + (55) + (85) +
(70) + (69) + (90) + (84) + (68) + (95) + (74) + (67) = 80,709 Prem
Mann, Introductory Statistics, 7/E Copyright 2010 John Wiley &
Sons. All right reserved
Slide 29
Substitute all the values in the formula for SSB, SSW and SST
Prem Mann, Introductory Statistics, 7/E Copyright 2010 John Wiley
& Sons. All right reserved
Slide 30
Calculating the Values of MSB and MSW MSB and MSW are
calculated as where k 1 and n k are, respectively, the df for the
numerator and the df for the denominator for the F distribution.
Remember, k is the number of different samples. Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 31
Slide 32
Draw the ANOVA Table Prem Mann, Introductory Statistics, 7/E
Copyright 2010 John Wiley & Sons. All right reserved
Slide 33
ANOVA Table for the Example Prem Mann, Introductory Statistics,
7/E Copyright 2010 John Wiley & Sons. All right reserved
Slide 34
Back to the question The scores of 15 form one students who
were randomly assigned to three groups in order to experiment with
three different methods of teaching Mathematics. At the 1%
significance level, can we reject the null hypothesis that the mean
Mathematics score of all fourth-grade students taught by each of
these three methods is the same? Assume that all the assumptions
required to apply the one-way ANOVA procedure hold true.
Slide 35
Solution Step 1: H 0 : 1 = 2 = 3 (The mean scores of the three
groups are all equal) H 1 : Not all three means are equal Step 2:
Because we are comparing the means for three normally distributed
populations, we use the F distribution to make this test.
Slide 36
Step 3: =.01 A one-way ANOVA test is always right-tailed Area
in the right tail is.01 df for the numerator = k 1 = 3 1 = 2 df for
the denominator = n k = 15 3 = 12 The required value of F is 6.93
Prem Mann, Introductory Statistics, 7/E Copyright 2010 John Wiley
& Sons. All right reserved
Slide 37
Critical value of F for df = (2,12) and =.01. Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 38
Steps 4 & 5: The value of the test statistic F = 1.09 It is
less than the critical value of F = 6.93 It falls in the
nonrejection region Hence, we fail to reject the null hypothesis We
conclude that the means of the three population are equal. Prem
Mann, Introductory Statistics, 7/E Copyright 2010 John Wiley &
Sons. All right reserved
Slide 39
Example 2 From time to time, unknown to its employees, the
research department at Post Bank observes various employees for
their work productivity. Recently this department wanted to check
whether the four tellers at a branch of this bank serve, on
average, the same number of customers per hour. The research
manager observed each of the four tellers for a certain number of
hours. The following table gives the number of customers served by
the four tellers during each of the observed hours. Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 40
Result Prem Mann, Introductory Statistics, 7/E Copyright 2010
John Wiley & Sons. All right reserved
Slide 41
Question At the 5% significance level, test the null hypothesis
that the mean number of customers served per hour by each of these
four tellers is the same. Assume that all the assumptions required
to apply the one-way ANOVA procedure hold true. Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 42
Solution Step 1: H 0 : 1 = 2 = 3 = 4 (The mean number of
customers served per hour by each of the four tellers is the same)
H 1 : Not all four population means are equal Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 43
Step 2: Because we are testing for the equality of four means
for four normally distributed populations, we use the F
distribution to make the test. Prem Mann, Introductory Statistics,
7/E Copyright 2010 John Wiley & Sons. All right reserved
Slide 44
Step 3: =.05. A one-way ANOVA test is always right- tailed.
Area in the right tail is.05. df for the numerator = k 1 = 4 1 = 3
df for the denominator = n k = 22 4 = 18 Prem Mann, Introductory
Statistics, 7/E Copyright 2010 John Wiley & Sons. All right
reserved
Slide 45
Critical value of F for df = (3, 18) and =.05. Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 46
Slide 47
Step 4: x = T 1 + T 2 + T 3 + T 4 =108 + 87 + 93 + 110 = 398 n
= n 1 + n 2 + n 3 + n 4 = 5 + 6 + 6 + 5 = 22 x = (19) + (21) + (26)
+ (24) + (18) + (14) + (16) + (14) + (13) + (17) + (13) + (11) +
(14) + (21) + (13) + (16) + (18) + (24) + (19) + (21) + (26) + (20)
= 7614 Prem Mann, Introductory Statistics, 7/E Copyright 2010 John
Wiley & Sons. All right reserved
Slide 48
Substitute all the values for formulas SSB,SSW Prem Mann,
Introductory Statistics, 7/E Copyright 2010 John Wiley & Sons.
All right reserved
Slide 49
Slide 50
ANOVA Table Prem Mann, Introductory Statistics, 7/E Copyright
2010 John Wiley & Sons. All right reserved
Slide 51
Step 5: The value for the test statistic F = 9.69 It is greater
than the critical value of F = 3.16 It falls in the rejection
region Consequently, we reject the null hypothesis We conclude that
the mean number of customers served per hour by each of the four
tellers is not the same. Prem Mann, Introductory Statistics, 7/E
Copyright 2010 John Wiley & Sons. All right reserved
Slide 52
Significance of mean effect When there is a significant
difference a post hoc statistic is performed. post hoc is a short
version of the Latin phrase that translates to after this,
therefore because of this. The post hoc test consist of pair wise
comparisons that are designed to compare all different combinations
of the treatment groups. It takes every pair of groups and perform
a t test on each pair of groups.
Slide 53
Post hoc results in SPSS SPSS was used to perform a post hoc
test on the results of the previous example. The F test revealed
difference among the four groups. The results of the post hoc are
as follows.
Slide 54
TellerMean Difference Std. ErrorSig95% confidence level Lower
Bound Upper Bound Teller A Teller B Teller C Teller D 7.100* 6.100*
-.400 1.795 1.895.005.015.995 2.03 1.03 -5.70 12.17 11.17 4.90
Teller B Teller A Teller C Teller D -7.100* -7.500* 1.795 1.712
1.795.005.936.003 -12.17 -5.84 -12.57 -2.03 3.84 -2.43 Teller C
Teller A Teller B Teller D -6.100* 1.000 -6.500* 1.795 1.712
1.995.015.936.010 -11.17 -3.84 -11.57 -1.03 5.84 -1.43 Teller D
Teller A Teller B Teller C.400 7.500* 6.500* 1.875
1.795.996.003.010 -4.90 2.43 1.43 5.79 12.57 11.57 * Mean
difference is significant at the.05 level
Slide 55
Tukey HSD This test display subsets of groups that have the
same means. The Tukey test creates two subsets of groups with
statistically similar means. TellerNSubset 12 A B C D Sig 66556655
14.50 15.50.943 21.60 22.00.996