Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
© Copyright 2004, Alan Marshall 1
Lectures 15/16Lectures 15/16
Analysis of Variance
© Copyright 2004, Alan Marshall 2
ANOVAANOVA
>ANOVA stands for ANalysis OfVAriance
>ANOVA allows us to:• Do multiple tests at one time
–more than two groups
• Test for multiple effects simultaneously–more than one variable
© Copyright 2004, Alan Marshall 3
ANOVA TestsANOVA Tests
The types of ANONA we will look at are:>One Way ANOVA>Randomized block design ANOVA>Two-Factor>We will also see ANOVA in regression
analysis
2
© Copyright 2004, Alan Marshall 4
One-Way ANOVAOne-Way ANOVA
>One-way ANOVA allows us tosimultaneously test to determine iftwo or more population means areequal
HO: µ1 = µ2 = µ3
HA: At least two means differ
© Copyright 2004, Alan Marshall 5
ANOVA assumptionsANOVA assumptions
>All populations are normallydistributed
>The population variances are equal• ANOVA tests assume that variances can
be pooled
>The observations are independent
© Copyright 2004, Alan Marshall 6
ExampleExample
>We are interested in seeing of theadvertising strategies employed inthree cities made a difference
>We assume that the three cities havebeen shown to be similar in the past
>The sales results for 20 weeks in eachof the three cities is displayed on thenext slide
3
© Copyright 2004, Alan Marshall 7
Example DataExample Data
529 498 804 492 672 691658 663 630 719 531 733793 604 774 787 443 698514 495 717 699 596 776663 485 679 572 602 561719 557 604 523 502 572711 353 620 584 659 469606 557 697 634 689 581461 542 706 580 675 679529 614 615 624 512 532
PriceCity 3City 1
Convenience QualityCity 2
© Copyright 2004, Alan Marshall 8
TerminologyTerminology
>We have a response variable, thelevel of weekly sales
>There are three factors ortreatments, the advertising strategyused in the three cities
© Copyright 2004, Alan Marshall 9
Means and Grand MeanMeans and Grand Mean
529 498 804 492 672 691658 663 630 719 531 733793 604 774 787 443 698514 495 717 699 596 776663 485 679 572 602 561719 557 604 523 502 572711 353 620 584 659 469606 557 697 634 689 581461 542 706 580 675 679529 614 615 624 512 532
Mean 577.55 Mean 653.00 Mean 608.65613.067
City 1Convenience
Grand Mean
QualityCity 2
PriceCity 3
4
© Copyright 2004, Alan Marshall 10
DiscussionDiscussion
>There are differences between themeans, but we are not sure if theyare significant.
>We could also observe that there is anamount of variation about the grandmean• Some of this variation is explained by the
treatments (advertising strategies)• Some remains unexplained
© Copyright 2004, Alan Marshall 11
Sum of SquaresSum of Squares
>In all forms of ANOVA, we analyze theSUMS OF SQUARES• essentially, the numerator in the
variance calculation
© Copyright 2004, Alan Marshall 12
Sum of Squares Between (SSB)Sum of Squares Between (SSB)
>The difference between the each of thetreatment (or factor) means and the grandmean is squared, multiplied by the numberof responses for that treatment, andsummed across treatments
>If the treatment means equaled the grandmean, the SSB would be 0
( )∑=
−=k
1i
2ii xxnSSB i = 1, 2, 3, …, k
group numbers
5
© Copyright 2004, Alan Marshall 13
Sum of Squares Within (SSW)Sum of Squares Within (SSW)
>The unexplained variation, SSW, is sum ofthe residual variation around the treatmentmeans
>Since for each treatment, s2 = SS/(n-1),we can also get the SSW by summing (n-1)s2 for each treatment
( )
( ) ( ) ( ) 233
222
211
k
1i
n
1j
2iij
s1ns1ns1n
xxSSWj
−+−+−=
−= ∑∑= =
© Copyright 2004, Alan Marshall 14
Mean SquaresMean Squares
kNSSWMSW
1kSSBMSB
−=
−=
>The Mean Square forTreatments (i.e.,between groups) is theSSB divided by thenumber of treatmentsminus 1
>The Mean SquareWithin is the SSWdivided by the samplesize minus the numberof treatments
© Copyright 2004, Alan Marshall 15
The Test StatisticThe Test Statistic
kNSSWMSW
1kSSBMSB
MSWMSBF
−=
−=
=
>The ratio of theMSB divided by theMSW is distributedaccording to an Fdistribution, with:ν1 = df1 = (k - 1) andν2 = df2 = (N - k)
6
© Copyright 2004, Alan Marshall 16
Means and Grand MeanMeans and Grand Mean
529 498 804 492 672 691658 663 630 719 531 733793 604 774 787 443 698514 495 717 699 596 776663 485 679 572 602 561719 557 604 523 502 572711 353 620 584 659 469606 557 697 634 689 581461 542 706 580 675 679529 614 615 624 512 532
Mean 577.55 Mean 653.00 Mean 608.65613.067
City 1Convenience
Grand Mean
QualityCity 2
PriceCity 3
© Copyright 2004, Alan Marshall 17
ExampleExample
Mean 577.55 Mean 653.00 Mean 608.65613.067
Between Samples25228.7 31893.4 390.139
57512.2Within Samples
s12 = 10775 s2
2 = 7238.11 s32 = 8670.24
506984
MSB = 28756.1 F = 3.23304MSW = 8894.45 p-value 0.04677
Grand Total (SSW)
City 1Convenience
Grand Mean
Grand Total (SSB)
QualityCity 2
PriceCity 3
© Copyright 2004, Alan Marshall 18
ExampleExample
Mean 577.55 Mean 653.00 Mean 608.65613.067
Between Samples25228.7 31893.4 390.139
57512.2Within Samples
s12 = 10775 s2
2 = 7238.11 s32 = 8670.24
506984
MSB = 28756.1 F = 3.23304MSW = 8894.45 p-value 0.04677
Grand Total (SSW)
City 1Convenience
Grand Mean
Grand Total (SSB)
QualityCity 2
PriceCity 3
( )( ) 7.25228067.61355.57720 2 =−
7
© Copyright 2004, Alan Marshall 19
InterpretationInterpretation
>Since P(F>3.23) = 0.0468 < α =0.05, we reject HO: µ1 = µ2 = µ3
>There is enough evidence to infer thatthe mean weekly sales differ betweenthe cities.
© Copyright 2004, Alan Marshall 20
ANOVA TableANOVA Table
Standard ANOVA Table
Source of Variation SS df Mean Square F-StatisticBetween Samples SSB k - 1 MSB = SSB/(k - 1) F = MSB/MSWWithin Samples SSW N - k MSW = SSW/(N - k)Total SST N - 1
Example
Source of Variation SS df Mean Square F-StatisticBetween Samples 57,512.2 2 28756.11667 3.233041411Within Samples 506,983.5 57 8894.447368Total 564,495.7 59
© Copyright 2004, Alan Marshall 21
Excel OutputExcel Output
Anova: Single Factor
SUMMARYGroups Count Sum Average Variance
Convenience 20 11551 577.55 10775Quality 20 13060 653 7238.105Price 20 12173 608.65 8670.239
ANOVASource of Variation SS df MS F P-value F critBetween Groups 57512.23 2 28756.12 3.233041 0.046773 3.158846Within Groups 506983.5 57 8894.447
Total 564495.7 59
8
© Copyright 2004, Alan Marshall 22
Required ConditionsRequired Conditions
>Each treatment (sub-sample) must benormal and the variances equal
>Our tests are crude: eyeball tests• Looking at the histograms
–if not non-normal, assume normal–text uses box and whisker plots
• Looking at the variances–if not very different, assume the same
© Copyright 2004, Alan Marshall 23
Formulae: Single Factor ANOVAFormulae: Single Factor ANOVA
Source ofVariation SS df MS F
BetweenGroups SSB k – 1
1kSSBMSB−
=MSWMSBF =
WithinGroups SSW N – k
kNSSWMSW−
=
Total SST N – 1
© Copyright 2004, Alan Marshall 24
Example L13, Slides 15-17Example L13, Slides 15-17 Revisited Revisited
>If we are simply looking at twosamples, and want to see if theremeans are equal, we can perform thet-test we did in Lecture 14, or an F-test
>This question was examining if therewere differences Prof. Goodstat’smorning and afternoon classes.
9
© Copyright 2004, Alan Marshall 25
ExampleExample
>Prof. Goodstat has two classes, one at8:30 and one at 1:00. On themidterm, the morning class of 45students had a mean of 70 and astandard deviation of 12, while theafternoon class of 40 had a mean of75 and a standard deviation of 13. Isthere evidence at α = 0.05 that thetwo classes are different?
© Copyright 2004, Alan Marshall 26
ExampleExample
( ) ( ) ( ) ( )
Reject Not Do
96.1835.1725.25
4013
4512
07570
ns
ns
xxt22
2
22
1
21
2121
−>−=−
=
+
−−=
+
µ−µ−−=
© Copyright 2004, Alan Marshall 27
Example - If PooledExample - If Pooled
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
Reject Not Do
96.18437.1711.25
401
451747.155
07570
n1
n1s
xxt
747.155240451693914444
2nns1ns1ns
21
2p
2121
21
222
2112
p
−>−=−
=
+
−−=
+
µ−µ−−=
=−+
+=
−+−+−
=
10
© Copyright 2004, Alan Marshall 28
Example - Using ANOVAExample - Using ANOVA
Morning Afternoon OverallMeans 70 75 72.35294Variances 144 169
SS df MS F p-valueBetween 249.13495 280.2768 529.4118 1 529.4118 3.3992 0.068798Within 144 169 12927 83 155.747
When we did the example using a t-test, t = 1.8437 0.068798
(tα/2,df)2 = Fα,1,df
© Copyright 2004, Alan Marshall 29
Block DesignBlock Design
© Copyright 2004, Alan Marshall 30
TerminologyTerminology
>Randomized Complete Block ANOVA(Text’s terminology)
>Two-way ANOVA without replication(Excel’s terminology)
>Other:• Randomized Block Design• Block design
11
© Copyright 2004, Alan Marshall 31
Block DesignBlock Design
>This is similar to the matched pairexperiment, but with more than pairs• We will have three or more treatments• The matched pair can be viewed as a
randomized block design with only twotreatments
© Copyright 2004, Alan Marshall 32
ExampleExamplePlot Fertilizer A Fertilizer B Fertilizer C
1 563 588 5752 593 624 5933 542 576 5644 649 672 6535 565 583 5566 587 612 5907 595 617 6078 429 446 4239 500 515 483
10 610 641 62611 524 547 52312 559 586 56813 546 582 55114 503 530 50215 550 573 56716 492 518 49517 497 529 51318 619 643 62619 473 497 47920 533 556 540
>Three fertilizershave been tested in20 plots
>The crop yields areshown at the left
>We want to test forvariation betweenfertilizers, but
>We could havevariation betweenthe plots
© Copyright 2004, Alan Marshall 33
ExampleExample
>This is a two-way ANOVA withoutreplication or block design since theresearcher is controlling for differencesthat may exist between plots of land
>Thus the first row (block) is representsthe three different fertilizers in plot #1,the second, plot #2, etc.
>Notice the similarity to Matched PairsDesign
12
© Copyright 2004, Alan Marshall 34
ExampleExample
>Columns (Fertilizers):• HO: µ1=µ2=µ3 HA: At least one is not equal
• α=0.01,• Rejection region: F.01,2,38 > 5.21 (Excel)
>Rows (Plots):• HO: All are equal• HA: At least one is not equal• α=0.01,• Rejection region: F.01,19,38 > 2.42 (Excel)
© Copyright 2004, Alan Marshall 35
ExampleExample
>Note that if there are not significantdifferences between the blocks (rows)then the single factor test would bemore appropriate.
© Copyright 2004, Alan Marshall 36
Example - Excel OutputExample - Excel OutputAnova: Two-Factor Without Replication
SUMMARY Count Sum Average Variance1 3 1726 575.3333 156.33332 3 1810 603.3333 320.3333. . . . .. . . . .. . . . .
20 3 1629 543 139
Fert-A 20 10929 546.45 2953.945Fert-B 20 11435 571.75 3104.197Fert-C 20 11034 551.7 3339.905
ANOVASource of Variation SS df MS F P-value F crit
Rows 177464.6 19 9340.242 323.1623 3.67E-36 2.42147Columns 7131.033 2 3565.517 123.363 2.41E-17 5.21123Error 1098.3 38 28.90263
Total 185693.9 59
13
© Copyright 2004, Alan Marshall 37
ExplanationExplanation
Source of Variation SSRows 177464.6 Variation attributable to the plotsColumns 7131.033 Variation attributable to the fertilizerError 1098.3 Unexplained variation
Total 185693.9 The total amount of variation to be explained
© Copyright 2004, Alan Marshall 38
ExampleExample
ANOVASource of Variation SS df MS F
Rows 177464.6 19 9340.242 323.1623Columns 7131.033 2 3565.517 123.363Error 1098.3 38 28.90263
Total 185693.9 59
© Copyright 2004, Alan Marshall 39
ExampleExample
>Since the F-Value (123.4) is greaterthan our critical F-Value (5.21), wereject the null hypothesis that thefertilizers are the same
>Likewise, the F-Value for the plots ofland (323.2) exceeds the critical value of2.42 indicating it was appropriate to usethis design
>The same results can be inferred by thelow p-values which are below oursignificance level, α=0.01
14
© Copyright 2004, Alan Marshall 40
DiscussionDiscussion
>We would not have been able to seethe differences between thefertilizers, since the difference wouldhave been “lost” in the variabilitybetween the plots.
© Copyright 2004, Alan Marshall 41
Using One-way ANOVAUsing One-way ANOVA
Anova: Single Factor
SUMMARYGroups Count Sum Average Variance
Fertilizer A 20 10929 546.45 2953.945Fertilizer B 20 11435 571.75 3104.197Fertilizer C 20 11034 551.7 3339.905
ANOVASource of Variation SS df MS F P-value F critBetween Groups 7131.033 2 3565.517 1.138167 0.327578 3.158846Within Groups 178562.9 57 3132.682
Total 185693.9 59
© Copyright 2004, Alan Marshall 42
The Formulae: Block DesignThe Formulae: Block Design
Source ofVariation SS df MS F
BetweenGroups SSB k – 1 1k
SSBMSB−
=MSEMSBF =
BetweenBlocks SSBL b – 1 1b
SSBLMSBL−
=MSE
MSBLF =
WithinGroups SSW (k – 1)(b – 1) ( )( )1b1k
SSWMSW−−
=
Total SST N – 1
15
© Copyright 2004, Alan Marshall 43
Two-Factor ANOVATwo-Factor ANOVA
AKATwo-way ANOVA with
replication
© Copyright 2004, Alan Marshall 44
Two Factor ANOVATwo Factor ANOVA
>Example extends Single Factor ANOVA>Suppose in the test market, we decide
to investigate the impact of the type ofmedia used: television andnewspapers
>Now we have two factors:• The advertising message (before)• The advertising medium (added here)
© Copyright 2004, Alan Marshall 45
HypothesesHypotheses
>For Message:HO: µA1 = µA2 = µA3
HA: At least two means differ>For Media:
HO: µB1 = µB2
HA: The two means differ
16
© Copyright 2004, Alan Marshall 46
Example DataExample Data
City-1 City-2 City-3 City-4 City-5 City-6
TV NP TV NP TV NP491 464 677 689 575 803712 559 627 650 614 584558 759 590 704 706 525447 557 632 652 484 498479 528 683 576 478 812624 670 760 836 650 565546 534 690 628 583 708444 657 548 798 536 546582 557 579 497 579 616672 474 644 841 795 587
Convenience Quality Price
© Copyright 2004, Alan Marshall 47
Single Factor TestSingle Factor Test
>We can perform the single factor testto see if there are differencesbetween the cities.
>Next slide, we see that there aredifferences between the cities
© Copyright 2004, Alan Marshall 48
Single Factor TestSingle Factor TestAnova: Single Factor
SUMMARYGroups Count Sum Average Variance
Column 1 10 5555 555.5 8641.389Column 2 10 5759 575.9 8545.878Column 3 10 6430 643 3884.667Column 4 10 6871 687.1 12558.54Column 5 10 6000 600 9527.556Column 6 10 6244 624.4 12523.82
ANOVASource of Variation SS df MS F P-value F critBetween Groups 113620.3 5 22724.06 2.448631 0.045165 2.386066Within Groups 501136.7 54 9280.309
Total 614757 59
17
© Copyright 2004, Alan Marshall 49
Two Factor TestTwo Factor Test
>Knowing that there are differencesbetween the cities we want to see ifboth factors are responsible for thedifferences
© Copyright 2004, Alan Marshall 50
Example - Data RearrangedExample - Data RearrangedConvenience Quality Price
Television 491 677 575712 627 614558 590 706447 632 484479 683 478624 760 650546 690 583444 548 536582 579 579672 644 795
Newspaper 464 689 803559 650 584759 704 525557 652 498528 576 812670 836 565534 628 708657 798 546557 497 616474 841 587
>We have toreorganize the datato reflect the twofactors
>The responses arecoloured• Yellow:
Convenience andTelevision
• Blue: Quality andTelevision
• etc.
© Copyright 2004, Alan Marshall 51
Output - IOutput - IAnova: Two-Factor With Replication
SUMMARY Convenience Quality Price TotalTelevision
Count 10 10 10 30Sum 5555 6430 6000 17985Average 555.5 643 600 599.5Variance 8641.388889 3884.667 9527.556 8164.397
NewspaperCount 10 10 10 30Sum 5759 6871 6244 18874Average 575.9 687.1 624.4 629.1333Variance 8545.877778 12558.54 12523.82 12579.91
TotalCount 20 20 20Sum 11314 13301 12244Average 565.7 665.05 612.2Variance 8250.852632 8300.682 10602.06
18
© Copyright 2004, Alan Marshall 52
Output - ANOVA TableOutput - ANOVA Table
>There is appears to be a difference betweenthe messages
>There is not enough evidence to suggestthat the media or the interaction issignificant
ANOVASource of Variation SS df MS F P-value F crit
Sample 13172.02 1 13172.02 1.419351 0.23872 4.01954Columns 98838.63 2 49419.32 5.32518 0.007748 3.168246Interaction 1609.633 2 804.8167 0.086723 0.917058 3.168246Within 501136.7 54 9280.309
Total 614757 59
© Copyright 2004, Alan Marshall 53
The Formulae: Two FactorThe Formulae: Two Factor
Source ofVariation SS df MS F
Factor A SSA a – 1 1aSS
MS AA −=
MSEMS
F A=
Factor B SSB b – 1 1bSS
MS BB −=
MSEMS
F B=
Interaction SSAB (a – 1)(b – 1) ( )( )1b1aSS
MS ABAB −−
=MSEMS
F AB=
Error SSE N – ab abNSSEMSE−
=
Total SST N – 1
© Copyright 2004, Alan Marshall 54
YOU LEARN STATISTICSYOU LEARN STATISTICSBY DOING STATISTICSBY DOING STATISTICS