26
Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test Test statistic used is called an F statistic (F-test)

Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Embed Size (px)

Citation preview

Page 1: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Analysis of Variance

• Compares means to determine if the population distributions are not similar

• Uses means and confidence intervals much like a t-test

• Test statistic used is called an F statistic (F-test)

Page 2: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Normal Distribution

• Most characteristics follow a normal distribution– For example: height, length, speed, etc.

• One of the assumptions of the ANOVA test is that the sample data is ‘normally distributed.’

Page 3: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Sample Distribution Approaches Normal Distribution With Sample Size

0

2

4

6

8

10

Fre

qu

ency

Population Sample

Page 4: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Sample Distribution Approaches Normal Distribution With Sample Size

0

2

4

6

8

10

Fre

qu

ency

Population Sample

Page 5: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Sample Distribution Approaches Normal Distribution With Sample Size

0

2

4

6

8

10

Fre

qu

ency

Population Sample

Page 6: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Proc Univariate

• Tests for normality

• Gives you a ‘visual’ of your sample distribution

• SAS code:

proc sort; by location

proc univariate plot normal; by location; var length; run;

Page 7: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.951978 Pr < W 0.6919 Kolmogorov-Smirnov D 0.156907 Pr > D >0.1500 Cramer-von Mises W-Sq 0.03281 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.223707 Pr > A-Sq >0.2500

Important Univariate Output

Each of the four above tests are testing for normality. The Shapiro-Wilk and Kolmogorov-Smirnov are the two most common. Because all p values are >0.05, none of the tests indicate that our sample is significantly different than a normal distribution.

Page 8: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Individual Weight Mean (Weight - Mean)2

1 26 28.17 4.70892 32 28.17 14.66893 25 28.17 10.04894 26 28.17 4.70895 30 28.17 3.34896 30 28.17 3.3489

N=6 169 SOS= 40.8334N-1=5

Variance = (x-x)2

N-1

i=xNNMean = x = Standard Deviation =

(x-x)2

N-1

Mean = 169/6 = 28.17

Range = 25 – 32

SOS = 40.83

Variance = 40.83 / 5 = 8.16

Std. Dev. = 40.83/5 = 2.86

Std. Err. = 2.86 / √6 = 1.17

Standard Error = SD

√N

Page 9: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

0

20

40

60

80

100

120

0 1 2 3

Pond Lake

Calculate a SOS based on an overall mean (total SOS)

ANOVA – Analysis of Variance

Page 10: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Trtmnt Replicate Length Overall Mean SOSTotal

Pond 1 34 57.7 561.69

Pond 2 78 57.7 412.09

Pond 3 48 57.7 94.09

Pond 4 24 57.7 1135.69

Pond 5 64 57.7 39.69

Pond 6 58 57.7 0.09

Pond 7 34 57.7 561.69

Pond 8 66 57.7 68.89

Pond 9 22 57.7 1274.49

Pond 10 44 57.7 187.69

Lake 1 38 57.7 388.09

Lake 2 82 57.7 590.49

Lake 3 58 57.7 0.09

Lake 4 76 57.7 334.89

Lake 5 60 57.7 5.29

Lake 6 70 57.7 151.29

Lake 7 99 57.7 1705.69

Lake 8 40 57.7 313.29

Lake 9 68 57.7 106.09

Lake 10 91 57.7 1108.89

9040.2

This provides a measure of the overall variance (Total SOS).

0

20

40

60

80

100

120

0 1 2 3

Pond Lake

Page 11: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

0

20

40

60

80

100

120

0 1 2 3

Pond Lake

Calculate a SOS based for each treatment (Treatment or Error SOS).

Page 12: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Trtmnt Replicate Length Trtmnt Mean SOSError

Pond 1 34 47.2 174.24

Pond 2 78 47.2 948.64

Pond 3 48 47.2 0.64

Pond 4 24 47.2 538.24

Pond 5 64 47.2 282.24

Pond 6 58 47.2 116.64

Pond 7 34 47.2 174.24

Pond 8 66 47.2 353.44

Pond 9 22 47.2 635.04

Pond 10 44 47.2 10.24

Lake 1 38 68.2 912.04

Lake 2 82 68.2 190.44

Lake 3 58 68.2 104.04

Lake 4 76 68.2 60.84

Lake 5 60 68.2 67.24

Lake 6 70 68.2 3.24

Lake 7 99 68.2 948.64

Lake 8 40 68.2 795.24

Lake 9 68 68.2 0.04

Lake 10 91 68.2 519.84

6835.2

This provides a measure of the reduction of variance by measuring each treatment separately (Treatment or Error SOS).

What happens to Error SOS when the variability w/in each treatment decreases?

0

20

40

60

80

100

120

0 1 2 3

Pond Lake

Page 13: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

0

20

40

60

80

100

120

0 1 2 3

Predicted_Pond Predicted_Lake Overall_Avg

Calculate a SOS for each predicted value vs. the overall mean (Model SOS)

Page 14: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Trtmnt Replicate Length Trtmnt Mean Overall Mean SOSModel

Pond 1 34 47.2 57.7 110.25

Pond 2 78 47.2 57.7 110.25

Pond 3 48 47.2 57.7 110.25

Pond 4 24 47.2 57.7 110.25

Pond 5 64 47.2 57.7 110.25

Pond 6 58 47.2 57.7 110.25

Pond 7 34 47.2 57.7 110.25

Pond 8 66 47.2 57.7 110.25

Pond 9 22 47.2 57.7 110.25

Pond 10 44 47.2 57.7 110.25

Lake 1 38 68.2 57.7 110.25

Lake 2 82 68.2 57.7 110.25

Lake 3 58 68.2 57.7 110.25

Lake 4 76 68.2 57.7 110.25

Lake 5 60 68.2 57.7 110.25

Lake 6 70 68.2 57.7 110.25

Lake 7 99 68.2 57.7 110.25

Lake 8 40 68.2 57.7 110.25

Lake 9 68 68.2 57.7 110.25

Lake 10 91 68.2 57.7 110.25

2205

This provides a measure of the distance between the mean values (Model SOS).

What happens to Model SOS when the two means are close together?

What if the means are equal?

Page 15: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Detecting a Difference Between Treatments

• Model SOS gives us an index on how far apart the two means are from each other.– Bigger Model SOS = farther apart

• Error SOS gives us an index of how scattered the data is for each treatment.– More variability = larger Error SOS = more

possible overlap between treatments

Page 16: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Magic of the F-test• The ratio of Model SOS to Error SOS (Model SOS divided

by Error SOS) gives us an overall index (the F statistic) used to indicate the relative ‘distance’ and ‘overlap’ between two means.– A large Model SOS and small Error SOS = a large F statistic. Why

does this indicate a significant difference?

– A small Model SOS and a large Error SOS = a small F statistic. Why does this indicate no significant difference??

• Based on sample size and alpha level (P-value), each F statistic has an associated P-value.– P < 0.05 (Large F statistic) there is a significant difference between

the means– P ≥ 0.05 (Small F statistic) there is NO significant difference

Page 17: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

Data chicken; input location$ replicate length; cards; Data Set not shown; proc print; run; proc sort; by location; /* proc means mean n var stddev cv stderr clm; by location; var length; run; */ proc anova; {Tells SAS to do the analysis of variance procedure} class location; {Tells SAS that location is a class variable} model length=location; (Tells SAS to compare length between locations} run;

Tells SAS to ignore everything between /* and */

SAS Program with ANOVA added

Page 18: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

The SAS System 10:10 Monday, June 19, 2006 1 Obs location replicate length

1 Pond 1 34 2 Pond 2 78 3 Pond 3 48 4 Pond 4 24 5 Pond 5 64 6 Pond 6 58 7 Pond 7 34 8 Pond 8 66 9 Pond 9 22 10 Pond 10 44 11 Lake 1 38 12 Lake 2 82 13 Lake 3 58 14 Lake 4 76 15 Lake 5 60 16 Lake 6 70 17 Lake 7 99 18 Lake 8 40 19 Lake 9 68 20 Lake 10 91

Data Set:

20 total observations

Two locations with 10 replicates each

Individual lengths

Page 19: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

The ANOVA ProcedureClass Level Information

Class Levels Values location 2 Lake Pond

Number of Observations Read 20 Number of Observations Used 20

SAS ANOVA Output 1st Page

This tell us that SAS understands that there are two classes: Lake and Pond. We also are told that SAS can use all 20 values in this ANOVA procedure.

Page 20: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

SAS ANOVA Output 2nd PageThe SAS System 10:10 Monday, June 19, 2006 4 The ANOVA Procedure

Dependent Variable: length Sum ofSource DF Squares Mean Square F Value Pr > FModel 1 2205.000000 2205.000000 5.81 0.0269Error 18 6835.200000 379.733333Corrected Total 19 9040.200000

R-Square Coeff Var Root MSE length Mean 0.243911 33.77253 19.48675 57.70000

Source DF Anova SS Mean Square F Value Pr > Flocation 1 2205.000000 2205.000000 5.81 0.0269

What are some ways to make the F Value larger?

÷÷

==

P-value

Page 21: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

The SAS System 13:17 Monday, October 4, 2004 1 Obs treat size 1 1 25 2 1 22 3 1 30 4 1 26 5 1 29 6 2 15 7 2 12 8 2 22 9 2 19 10 2 18 11 3 32 12 3 31 13 3 27 14 3 26 15 3 29

Data Set;

3 treatments with 5 replicates per treatment

What about analysis of variance with three treatments:

Treatment Mean

1 26.4

2 17.2

3 29.0

Page 22: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

The SAS System 13:17 Monday, October 4, 2004 2  The ANOVA Procedure  Class Level Information  Class Levels Values  treat 3 1 2 3   Number of observations 15

Variable name Variable labels

Page 23: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

The SAS System 13:17 Monday, October 4, 2004 3  The ANOVA Procedure Dependent Variable: size  Sum of Source DF Squares Mean Square F Value Pr > F  Model 2 384.4000000 192.2000000 18.30 0.0002  Error 12 126.0000000 10.5000000  Corrected Total 14 510.4000000   R-Square Coeff Var Root MSE size Mean  0.753135 13.38996 3.240370 24.20000   Source DF Anova SS Mean Square F Value Pr > F  treat 2 384.4000000 192.2000000 18.30 0.0002

P-valueF-value

=

=

Treatment Mean

1 26.4

2 17.2

3 29.0

Which means are different/similar?

Page 24: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

proc anova; {Tells SAS to do the analysis of variance procedure} class treatment; {Tells SAS that treatment is a class variable} model weight=treatment; (Tells SAS to compare weight among treatments} means treatment / tukey lines; {Tells SAS to delineate the means with a

Tukey test and use the lines method to show differences. run;

Delineating the Means With SAS

Page 25: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

The SAS System 13:17 Monday, October 4, 2004 7  The ANOVA Procedure  Tukey's Studentized Range (HSD) Test for size  NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ.   Alpha 0.05 Error Degrees of Freedom 12 Error Mean Square 10.5 Critical Value of Studentized Range 3.77278 Minimum Significant Difference 5.4673   Means with the same letter are not significantly different.   Tukey Grouping Mean N treat  A 29.000 5 3 A A 26.400 5 1  B 17.200 5 2

Treat 1 and 3 are not different, 1 and 3 are different than 2

Page 26: Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test

0

5

10

15

20

25

30

35

1 2 3

A

B

A

Showing Results