Upload
mervin-knight
View
232
Download
2
Tags:
Embed Size (px)
Citation preview
1
Chapter 13Analysis of Variance
2
Chapter Outline
An introduction to experimental design and analysis of variance
Analysis of Variance and the completely randomized design
3
An Introduction to Experimental Design and Analysis and Variance
Statistical studies can be classified as being either experimental or observational.
In an experimental study, one or more factors are controlled so that data can be obtained about how the factors influence the variables of interest.
In an observational study, no attempt is made to control the factors.
Cause and effect relationship are easier to establish in experimental studies than in observational studies.
Analysis of variance (ANOVA) can be used to analyze the data obtained from experimental or observational studies.
4
An Introduction to Experimental Design and Analysis and Variance
Three types of experimental designs are introduced. A completely randomly design A randomized block design A factorial experiment
5
An Introduction to Experimental Design and Analysis and Variance
A factor is a variable that the experimenter has selected for investigation.
A treatment is a level of a factor For example, if location is a factor, then a treatment of
location can be New York, Chicago, or Seattle. Experimental units are the objects of interest in the
experiment. A completely randomized design is an
experimental design in which the treatments are randomly assigned to the experimental units.
6
Analysis of Variance: A Conceptual Overview
Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means.
Data obtained from observational or experimental studies can be used for the analysis.
We want to use the sample results to test the following hypothesis:
HH00: : 11==22==33==. . . . . . = = kk
HHaa: Not all population means are equal: Not all population means are equal
7
Analysis of Variance: A Conceptual Overview
HH00: : 11==22==33==. . . . . . = = kk
HHaa: Not all population means are equal: Not all population means are equal
If If HH00 is rejected, we cannot conclude that is rejected, we cannot conclude that allall population means are different.population means are different. If If HH00 is rejected, we cannot conclude that is rejected, we cannot conclude that allall population means are different.population means are different.
Rejecting Rejecting HH00 means that means that at least twoat least two population population means have different values.means have different values. Rejecting Rejecting HH00 means that means that at least twoat least two population population means have different values.means have different values.
8
Analysis of Variance: A Conceptual Overview
Assumptions for Analysis of VarianceAssumptions for Analysis of Variance
For each population, the response (dependent)For each population, the response (dependent) variable is variable is normally distributednormally distributed.. For each population, the response (dependent)For each population, the response (dependent) variable is variable is normally distributednormally distributed..
The variance of the response variable, denoted The variance of the response variable, denoted 22,, is the is the samesame for all of the populations. for all of the populations. The variance of the response variable, denoted The variance of the response variable, denoted 22,, is the is the samesame for all of the populations. for all of the populations.
The observations must be The observations must be independentindependent.. The observations must be The observations must be independentindependent..
9
Analysis of Variance: A Conceptual Overview
Sampling Distribution of Given Sampling Distribution of Given HH00 is is TrueTruex
1x1x 3x3x2x2x
22x n
2
2x n
µ
If H0 is true, all the populations have the same mean. It is also assumed that all the populations have the same variance. Therefore, all the sample means are drawn from the same sampling distribution. As a result, the sample means tend to be close to one another.
Sample means are likely to be close to the same population mean if H0 is true.
10
Analysis of Variance: A Conceptual Overview
Sampling Distribution of Given Sampling Distribution of Given HH00 is is FalseFalsex
33 1x1x 2x2x3x3x 11 22
When H0 is false, sample means are drawn from differentpopulations. As a result, sample means tend NOT to be
close together. Instead, they tend to be close to their own population means.
11
Analysis of Variance
Between-treatments estimate of population variance
Within-treatments estimate of population variance Comparing the variance estimates: The F test ANOVA table
12
Between-Treatments Estimate of Population Variance 22
The estimate of 2 2 based on the variation of the based on the variation of the sample means is called the mean square due to sample means is called the mean square due to treatments and is denoted by MSTR.treatments and is denoted by MSTR.
2
1
( )
MSTR1
k
j jj
n x x
k
2
1
( )
MSTR1
k
j jj
n x x
k
Denominator is theDenominator is thedegrees of freedomdegrees of freedom
associated with associated with SSTRSSTR
Numerator is calledNumerator is calledthe the sum of squares sum of squares
duedueto treatmentsto treatments
(SSTR)(SSTR)
13
Between-Treatments Estimate of Population Variance 22
2
1
( )
MSTR1
k
j jj
n x x
k
2
1
( )
MSTR1
k
j jj
n x x
k
• k is the number of treatments (# of samples)
• is the number of observations in treatment j
• is the sample mean of treatment j
• is the overall mean, i.e. the average value of ALL the
observations from all the treatments
jn
jx
x
T
k
j
n
iij
n
x
x
j
1 1
14
Within-Treatments Estimate of Population Variance 22
The estimate of 2 2 based on the variation of the based on the variation of the sample observations within each sample is called sample observations within each sample is called the mean square due to error and is denoted by the mean square due to error and is denoted by MSE.MSE.
kn
sn
MSET
k
jjj
1
21
Denominator is Denominator is thethe
degrees of degrees of freedomfreedom
associated with associated with SSESSE
Numerator is Numerator is calledcalled
the the sum of sum of squaressquares
due to errordue to error (SSE)(SSE)
15
Within-Treatments Estimate of Population Variance 22
• k is the number of treatments (# of samples)
• is the number of observations in treatment j
• is the sample variance of treatment j
• nT is the total number of ALL the observations from all the
treatments
jn
kn
sn
MSET
k
jjj
1
21
2js
k
jjT nn
1
16
Comparing the Variance Estimates: The F Test
Because the within-treatments estimate (MSE) of 22 only involves sample variances, all of which are unbiased estimates of the population variance (according to the assumptions, all the population variances are the same), MSE is a good estimate of population variance regardless whether H0 is true or not.
On the other hand, the between-treatments estimate (MSTR), which uses sample means, will be a good estimate of 22 if H0 is true, since all the sample means are drawn from the same population when H0 is true.
When H0 is false, the sample means are drawn from different populations (with different µ). Therefore, MSTR will overestimate 2 2 since the sample means will not be since the sample means will not be close together. close together.
17
Comparing the Variance Estimates: The F Test
If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR degrees of freedom (d.f.) equal to k -1 and MSE d.f. equal to nT - k.
If H0 is true, MSTR/MSE should be close to 1 since both are good estimates of 22..
If H0 is false, i.e. if the means of the k populations are not equal, the ratio MSTR/MSE will be larger than 1 since MSTR overestimates 2 .2 .
Hence, we will reject H0 if the value of MSTR/MSE proves to be too large to have been resulted at random from the appropriate F distribution.
18
Comparing the Variance Estimates: The F Test
Sampling Distribution of MSTR/MSE
Do Not Reject H0Do Not Reject H0
Reject H0Reject H0
MSTR/MSEMSTR/MSE
Critical ValueCritical ValueFF
Sampling DistributionSampling Distributionof MSTR/MSEof MSTR/MSE
19
ANOVA Table
MSTRSSTR
-
k 1MSTR
SSTR-
k 1
MSESSE
-
n kT
MSESSE
-
n kT
MSTRMSE
MSTRMSE
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquareSquare FF
TreatmentsTreatments
ErrorError
TotalTotal
kk - 1 - 1
nnTT - 1 - 1
SSTRSSTR
SSESSE
SSTSST
nnT T - - kk
SST is SST is partitionedpartitioned
into SSTR and into SSTR and SSE.SSE.
SST’s degrees of SST’s degrees of freedomfreedom
(d.f.) are partitioned (d.f.) are partitioned intointo
SSTR’s d.f. and SSE’s SSTR’s d.f. and SSE’s d.f.d.f.
pp--ValueValue
20
ANOVA Table
SST divided by its degrees of freedom SST divided by its degrees of freedom nnTT – 1 is the – 1 is the overall sample variance that would be obtained if weoverall sample variance that would be obtained if we treated the entire set of observations as one data set.treated the entire set of observations as one data set.
SST divided by its degrees of freedom SST divided by its degrees of freedom nnTT – 1 is the – 1 is the overall sample variance that would be obtained if weoverall sample variance that would be obtained if we treated the entire set of observations as one data set.treated the entire set of observations as one data set.
With the entire data set as one sample, the formulaWith the entire data set as one sample, the formula for computing the total sum of squares, SST, is:for computing the total sum of squares, SST, is:
With the entire data set as one sample, the formulaWith the entire data set as one sample, the formula for computing the total sum of squares, SST, is:for computing the total sum of squares, SST, is:
2
1 1
SST ( ) SSTR SSEjnk
ijj i
x x
2
1 1
SST ( ) SSTR SSEjnk
ijj i
x x
21
ANOVA Table
ANOVA can be viewed as the process of partitioningANOVA can be viewed as the process of partitioning the total sum of squares and the degrees of freedomthe total sum of squares and the degrees of freedom into their corresponding sources: treatments and error.into their corresponding sources: treatments and error.
ANOVA can be viewed as the process of partitioningANOVA can be viewed as the process of partitioning the total sum of squares and the degrees of freedomthe total sum of squares and the degrees of freedom into their corresponding sources: treatments and error.into their corresponding sources: treatments and error. Dividing the sum of squares by the appropriateDividing the sum of squares by the appropriate degrees of freedom provides the variance estimates.degrees of freedom provides the variance estimates.
Dividing the sum of squares by the appropriateDividing the sum of squares by the appropriate degrees of freedom provides the variance estimates.degrees of freedom provides the variance estimates.
The The FF value (MSTR/MSE) is used to test the hypothesis value (MSTR/MSE) is used to test the hypothesis of equal population means.of equal population means. The The FF value (MSTR/MSE) is used to test the hypothesis value (MSTR/MSE) is used to test the hypothesis of equal population means.of equal population means.
22
Test for the Equality of k Population Means
HypothesesHypotheses
Test StatisticTest Statistic
H0: 1=2=3=. . . = k
Ha: Not all population means are equal
F = MSTR/MSE
23
Test for the Equality of k Population Means
Rejection RuleRejection Rule
pp-value Approach:-value Approach: Reject H0 if p-value <
Critical Value Approach:Critical Value Approach: Reject H0 if F > F
where the value of where the value of FF is based on an is based on anFF distribution with distribution with kk - 1 numerator d.f. - 1 numerator d.f.and and nnTT - - kk denominator d.f. denominator d.f.
24
Test for the Equality of k Population Means:An Observational Study
Example: Reed ManufacturingExample: Reed Manufacturing
Janet Reed would like to know if there is Janet Reed would like to know if there is anyanysignificant difference in the mean number of significant difference in the mean number of hourshoursworked per week for the department worked per week for the department managers atmanagers ather three manufacturing plants (in Buffalo,her three manufacturing plants (in Buffalo,Pittsburgh, and Detroit). An Pittsburgh, and Detroit). An FF test will be test will be conductedconductedusing using = .05. = .05.
25
Test for the Equality of k Population Means:An Observational Study
Example: Reed ManufacturingExample: Reed Manufacturing
A simple random sample of five managers A simple random sample of five managers fromfromeach of the three plants was taken and the each of the three plants was taken and the number ofnumber ofhours worked by each manager in the previous hours worked by each manager in the previous weekweekis shown on the next slide.is shown on the next slide.
Factor . . . Manufacturing plantTreatments . . . Buffalo, Pittsburgh, DetroitExperimental units . . . Managers
Response variable . . . Number of hours worked
26
Test for the Equality of k Population Means:An Observational Study
1122334455
48485454575754546262
73736363666664647474
51516363616154545656
Plant 1Plant 1BuffaloBuffalo
Plant 2Plant 2PittsburghPittsburgh
Plant 3Plant 3DetroitDetroitObservationObservation
Sample MeanSample MeanSample VarianceSample Variance
5555 68 68 57 5726.026.0 26.5 26.5 24.5 24.5
27
Test for the Equality of k Population Means:An Observational Study
H0: 1=2=3
Ha: Not all the means are equalwhere: 1 = mean number of hours worked per
week by the managers at Plant 1 2 = mean number of hours worked per week by the managers at Plant 23 = mean number of hours worked per week by the managers at Plant 3
1. Develop the hypotheses.1. Develop the hypotheses.
28
Test for the Equality of k Population Means:An Observational Study
2. Specify the level of significance. = .05= .05
3. Compute the value of the test statistic.
MSTR = 490/(3 - 1) = 245
SSTR = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = 490
= (55 + 68 + 57)/3 = 60
(Only when sample sizes are all equal, the overall mean is equal to the average of sample means.)
Mean Square Due to Treatments Due to Treatments
3/321 xxxx
29
Test for the Equality of k Population Means:An Observational Study
3. Compute the value of the test statistic.
MSE = 308/(15 - 3) = 25.667
SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308Mean Square Due to Error
(con’t.)
F = MSTR/MSE = 245/25.667 = 9.55
30
Test for the Equality of k Population Means:An Observational Study
TreatmentTreatment
ErrorError
TotalTotal
490490
308308
798798
22
1212
1414
245245
25.66725.667
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquareSquare
9.559.55
FF
ANOVA Table
pp-Value-Value
.0033.0033
31
Test for the Equality of k Population Means:An Observational Study
p – Value Approach
5. Determine whether to reject H0.
We have sufficient evidence to conclude that the mean number of hours worked per week by department managers is not the same at all 3 plants.
The p-value < .05, so we reject H0.
With 2 numerator d.f. and 12 denominator d.f.,the p-value is .0033 for F = 9.55.
4. Compute the p –value.
32
Test for the Equality of k Population Means:An Observational Study
Critical Value Approach
5. Determine whether to reject H0.
Because F = 9.55 > 3.89, we reject H0.
4. Determine the critical value and rejection rule.
Reject H0 if F > 3.89 (critical value)
We have sufficient evidence to conclude that the mean number of hours worked per week by department managers is not the same at all 3 plants.
Based on an F distribution with 2 numeratord.f. and 12 denominator d.f., F.05 = 3.89.