31
Chapter 13 Analysis of Variance (ANOVA)

Chapter 13

  • Upload
    holly

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Chapter 13. Analysis of Variance (ANOVA). Analysis of Variance (ANOVA). ANOVA can be used to test for differences between three or more means. The hypotheses for an ANOVA are always: H 0 : m 1 = m 2 = . . . = m k (where k is the number of groups) - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 13

Chapter 13

Analysis of Variance (ANOVA)

Page 2: Chapter 13

Analysis of Variance (ANOVA)

ANOVA can be used to test for differences between three or more means. The hypotheses for an ANOVA are always:

H0: m1 = m2 = . . . = mk

(where k is the number of groups)

Ha: Not all of the population means are equal

Page 3: Chapter 13

Analysis of Variance: Assumptions

•For each population the response variable is normally distributed•The variance of the response variable, s2, is the same for all of the populations•The observations must be independent

Page 4: Chapter 13

Analysis of Variance (ANOVA)

The ANOVA hypothesis test is based on a comparison of the variation between groups (treatments) and within groups (treatments).

Page 5: Chapter 13

Between vs. Within VariationFamily Cars Attribute 1 Attribute 2

1 1 A X2 1 A Y3 1 A Z4 2 B X5 2 B Y6 2 B Z7 3 C X8 3 C Y9 3 C Z

Page 6: Chapter 13

Between vs. Within Variation

Group A Group B Group C1 2 31 2 31 2 3

Case 1: All variation is due to differences between groups

Page 7: Chapter 13

Between vs. Within Variation

Group X Group Y Group Z1 1 12 2 23 3 3

Case 2: All variation is due to differences within groups

Page 8: Chapter 13

Analysis of Variance (ANOVA)

If the variation is primarily due to differences between groups then we would conclude the means are different and reject H0.

If the variation is primarily due to differences within the groups then we would conclude the means are the same and accept H0.

Page 9: Chapter 13

Analysis of Variance (ANOVA)The relative sizes of between and within group variation are measured by comparing two estimates of the variance.

nsx̄�2 is used to estimate nsx̄�

2 and s2. If the means are equal nsx̄�

2 will be an unbiased estimator of s2. If the means are not equal nsx̄�

2 will overestimate s2.

22

xn

n

n

x

x

Page 10: Chapter 13

Sampling Distribution of Given H0 is True

1x 3x2x

Sample means are close together because they are drawn from the same sampling distribution when H0 is true.

22x n

ANOVAx

Page 11: Chapter 13

Sampling Distribution of Given H0 is False

33 1x 2x3x 11 22

Sample means come from different sampling distributionsand are not as close together when H0 is false.

ANOVAx

Page 12: Chapter 13

Analysis of Variance (ANOVA)

The second way of estimating the population variance is to find the average of the variances of the different groups.

This approach provides an unbiased estimate regardless whether or not the null hypothesis is true.

k

k

jjs

1

2

Page 13: Chapter 13

Analysis of Variance (ANOVA)

ks

nsk

jj

x

1

2

2

If we take the ratio of the two approaches we have a measure that has an expected value of 1 if the null hypothesis is true. It will be larger than 1 if the null hypothesis is false.

Page 14: Chapter 13

Analysis of Variance (ANOVA)

If the null hypothesis is true and the conditions for conducting the ANOVA test are met then the sampling distribution of the ratio is an F distribution with k-1 degrees of freedom in the numerator and nT – k degrees of freedom in the denominator.

Page 15: Chapter 13

F Distribution

a

Page 16: Chapter 13

F Distribution

As before a is the probability of rejecting H0 when it is true (the probability of making a Type I error).

Fa is the critical value such that an area equal to a lies in the upper-most tail.

For example, with 5 degrees of freedom in the denominator and 10 degrees of freedom in the numerator, an F value of 4.74 would capture an area equal to 0.05 in the tail.

Page 17: Chapter 13

ANOVA Hypothesis Test

The steps for conducting an ANOVA hypothesis test are the same as for conducting a hypothesis test of the mean:

1. State the hypotheses2. State the rejection rule3. Calculate the test statistic4. State the result of the test and its implications

Page 18: Chapter 13

ANOVA Table

Typically when we do the test we organize the calculations in table with a specific format:

Source of variation

Sum of Squares

Degrees of freedom

Mean Square

F

Treatments SSTR k-1 MSTR FError SSE nT-k MSE

Total SST nT-1

Page 19: Chapter 13

Between-Treatments Estimateof Population Variance

Denominator is thedegrees of freedomassociated with SSTR

Numerator is calledthe sum of squares dueto treatments (SSTR)

The estimate of 2 based on the variation of the sample means is called the mean square due to treatments and is denoted by MSTR.

1

1

2

k

xxn

MSTR

k

jjj

Page 20: Chapter 13

Between-Treatments Estimateof Population Variance

21

2

1

2

11ns

k

xxn

k

xxn

MSTR

k

jj

k

jjj

Assume there are the same number of observations in each group so that nj = n.

Page 21: Chapter 13

The estimate of 2 based on the variation of the sample observations within each sample is called the mean square error and is denoted by MSE.

Within-Treatments Estimateof Population Variance s 2

Denominator is thedegrees of freedomassociated with SSE

Numerator is calledthe sum of squaresdue to error (SSE)

kn

sn

MSET

k

jjj

1

21

Page 22: Chapter 13

Assume there are the same number of observations in each group so that nj = n and nT = nk.

Within-Treatments Estimateof Population Variance

k

s

nk

sn

knk

sn

kn

sn

MSE

k

jj

k

jj

k

jj

T

k

jjj

1

2

1

2

1

2

1

2

1

1

11

Page 23: Chapter 13

ANOVA Table

With the entire data set as one sample, the formula for computing the total sum of squares, SST, is:

SST divided by its degrees of freedom nT-1 is the overall sample variance that would be obtained if we treated the entire set of observations as one data set.

SSESSTRxxSSTk

j

n

iij

j

2

1 1

Page 24: Chapter 13

MSTRSSTR

-

k 1MSTR

SSTR-

k 1

MSESSE

-n kT

MSESSE

-n kT

MSTRMSE

MSTRMSE

Source ofVariation

Sum ofSquares

Degrees ofFreedom

MeanSquare F

Treatments

Error

Total

k - 1

nT - 1

SSTR

SSE

SST

nT - k

SST is partitionedinto SSTR and SSE.

SST’s degrees of freedom(d.f.) are partitioned intoSSTR’s d.f. and SSE’s d.f.

ANOVA Table

Page 25: Chapter 13

A B C0 2 121 3 06 4 22 2 61 0 4

Assume we are interested in finding if the average number of cars owned is different for three different towns. Five people are interviewed in each town. Assume a = .05

ANOVA Example

Page 26: Chapter 13

Analysis of Variance (ANOVA)

H0: m1 = m2 = m3

Ha: Not all of the population means are equal

Reject H0 if: F > Fa F > 3.89

Given k - 1 = 3 - 1 = 2 df in the numeratorand nT - k = 15 - 3 = 12 df in the denominator

Page 27: Chapter 13

A B C0 2 121 3 06 4 22 2 61 0 4

Mean= 2 2.2 4.8

ANOVA Example

= (2+2.2+4.8)/3 = 9/3 = 3x

Page 28: Chapter 13

Analysis of Variance (ANOVA)

SSTR = 5(2-3)2 + 5(2.2-3)2 + 5(4.8-3)2

= 24.4

SSE = (0-2)2 + (1-2)2 + (6-2)2 + (2-2)2 + (1-2)2 +(2-2.2)2 + (3-2.2)2 + (4-2.2)2 + (2-2.2)2 + (0-2.2)2 +(12-4.8)2 + (0-4.8)2 + (2-4.8)2 + (6-4.8)2 + (4-4.8)2 =115.6

Page 29: Chapter 13

ANOVA Table

Source of variation

Sum of Squares

Degrees of freedom

Mean Square

F

Treatments 24.4 2 12.2 1.27Error 115.6 12 9.6Total 140 14

Page 30: Chapter 13

Anova: Single Factor

SUMMARYGroups Count Sum Average Variance

Column 1 5 10 2 5.5Column 2 5 11 2.2 2.2Column 3 5 24 4.8 21.2

ANOVASource of Variation SS df MS F P-value F crit

Between Groups 24.4 2 12.2 1.266 0.3169 3.885Within Groups 115.6 12 9.633

Total 140 14

Page 31: Chapter 13

A B C1 4 53 1 32 3 22 4 6

Assume we are interested in finding if the average number of bedrooms per home is different for three different towns. Data on four houses were collected in each town. Assume a = .05

Graded Homework

P. 401, #7 (just do a hypothesis test)