24
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Embed Size (px)

Citation preview

Page 1: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Chapter 12

The Analysis of Categorical Data and Goodness of Fit Tests

Page 2: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

There are six colors – so k = 6.

Suppose we wanted to determine if the proportions for the different colors in a large bag of M&M candies matches the proportions that the company claims is in their candies.

We could record the color of each candy in the bag.

This would be univariate,

categorical data.

How many categories for color would there

be?

k is used to denote the number of categories

for a categorical variable

Page 3: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

M&M Candies Continued . . .

We could count how many candies of each color are in the bag.

A one-way frequency table is used to display the observed counts for

the k categories.

Red Blue Green Yellow Orange

Brown

23 28 21 19 22 25

A goodness-of-fit test will allow us to

determine if these observed counts are

consistent with what we expect to have.

Page 4: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Goodness-of-Fit Test Procedure

Null Hypothesis: H0: p1 = hypothesized proportion for Category 1

pk = hypothesized proportion for Category kHa: H0 is not true

Test Statistic:

. . .

cells all

22

count cell expectedcount cell expected - count cell observed

X

The goodness-of-fit statistic, denoted by X2, is a quantitative measure to the extent to which the observed counts differ from those expected when H0 is true.

The X2 value can never be negative.

Read “chi-squared”

The goodness-of-fit test is used to analysze univariate

categorical data from a single sample.

Page 5: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Goodness-of-Fit Test Procedure Continued . . .P-values: When H0 is true and all expected counts

are at least 5, X2 has approximately a chi-square distribution with df = k – 1. Therefore, the P-value associated with the computed test statistic value is the area to the right ofX under the df = k – 1 chi-square curve.

Assumptions:1) Observed cell counts are based on a random

sample2) The sample size is large enough as long as every

expected cell count is at least 5

Page 6: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

• Different df have different curves• curves are skewed right• As df increases, the 2 curve shifts

toward the right and becomes more like a normal curve

Facts About2 distributions

df=3

df=5

df=10

Page 7: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

A common urban legend is that more babies than expected are born during certain phases of the lunar cycle, especially near the full moon.

The table below shows the number of days in the eight lunar phases with the number of births in each phase for 24 lunar cycles.

Lunar Phase Number of Days Number of Births

New Moon 24 7680

Waxing Crescent 152 48,442

First Quarter 24 7579

Waxing Gibbous 149 47,814

Full Moon 24 7711

Waning Gibbous 150 47,595

Last Quarter 24 7733

Waning Crescent 152 48,230

There are eight phases so k = 8.

Page 8: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Lunar Phases Continued . . .

Let:

p1 = proportion of births that occur during the new moon

p2 = proportion of births that occur during the waxing crescent moon

p3 = proportion of births that occur during the first quarter moon

p4 = proportion of births that occur during the waxing gibbous moon

p5 = proportion of births that occur during the full moon

p6 = proportion of births that occur during the waning gibbous moon

p7 = proportion of births that occur during the last quarter moon

p8 = proportion of births that occur during the waning crescent moon

There is a total of 699 days in the 24 lunar cycles. If there is no relationship between the number of births and lunar phase, then the expected

proportions equal the number of days in each phase out of the

total number of days.

p1 = .0343 p2 = .2175 p3 = .0343 p4 = .2132

P5 = .0343 p6 = .2146 p7 = .0343 p8 = .2175

The hypothesis statements would be:

H0: p1 = .0343, p2 = .2175, p3 = .0343, p4 = .2132, p5 = .0343, p6 = .2146, p7 = .0343, p8 = .2175

Ha: H0 is not true

Page 9: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Lunar Phase Observed Number of Births

Expected Number of Births

New Moon 7680 7641.49

Waxing Crescent 48,442 48455.52

First Quarter 7579 7641.49

Waxing Gibbous 47,814 47,497.55

Full Moon 7711 7641.49

Waning Gibbous 47,595 47809.45

Last Quarter 7733 7641.49

Waning Crescent 48,230 48,455.52

Lunar Phases Continued . . .

There is a total of 222,784 births in the sample. If there is no relationship

between the number of births and lunar phase, then the expected

counts for each category would equal n(hypothesized proportion).

H0: p1 = .0343, p2 = .2175, p3 = .0343, p4 = .2132, p5 = .0343, p6 = .2146, p7 = .0343, p8 = .2175

Ha: H0 is not true

Page 10: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Lunar Phases Continued . . .

H0: p1 = .0343, p2 = .2175, p3 = .0343, p4 = .2132, p5 = .0343, p6 = .2146, p7 = .0343, p8 = .2175

Ha: H0 is not true

Test Statistic:

P-value > .10 df = 7 = .05

Since the P-value > , we fail to reject H0. There is not sufficient evidence to conclude that lunar phases and number of births are related.

557.652.455,48

)52.455,48230,48(...

52.455,48)52.455,48442,48(

49.7641)49.76417680( 222

2

X

What type of error could we have potentially made with this decision? Type II

The X2 test statistic is smaller than the smallest entry in the df = 7 column

of Appendix Table 8.

Page 11: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

A study was conducted to determine if collegiate soccer players had in increased risk of concussions over other athletes or students. The two-way frequency table below displays the number of previous concussions for students in independently selected random samples of 91 soccer players, 96 non-soccer athletes, and 53 non-athletes.

Number of Concussions

0 1 23 or

moreTotal

Soccer Players 45 25 11 10 91

Non-Soccer Players

68 15 8 5 96

Non-Athletes 45 5 3 0 53

Total 158 45 22 15 240

These values in green are the observed

counts.

Also called a contingency table.

These values in blue are the marginal

totals.

This value in red is the grand total.

This is univariate categorical data - number of concussions

- from 3 independent samples.

If there were no difference between these 3 populations in regards to the number of concussions, how many soccer players would you expect to

have no concussions?

We would expect (158/240)(91).

Page 12: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

X2 Test for Homogeneity

Null Hypothesis: H0: the true category proportions are the same for all the populations or treatments

Alternative Hypothesis:Ha: the true category proportions are not all the same for all the populations or treatments

Test Statistic:

cells all

22

count cell expectedcount cell expected - count cell observed

X

The 2 Test for Homogeneity is used to analyze univariate

categorical data from 2 or more independent samples.

Page 13: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

X2 Test for Homogeneity Continued . . .

Expected Counts: (assuming H0 is true)

P-value: When H0 is true and all expected counts are at least 5, X2 has approximately a chi-square distribution with df = (number of rows – 1)(number of columns – 1). The P-value associated with the computed test statistic value is the area to the right ofX under the appropriate chi-square curve.

total grandtotal) marginal umntotal)(col marginal (row

counts cell expected

Page 14: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

X2 Test for Homogeneity Continued . . .

Assumptions:1) Data are from independently chosen

random samples or from subjects who were assigned at random to treatment groups.

2) The sample size is large: all expected cell counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.

Page 15: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Soccer Players Continued . . .

Number of Concussions

0 1 23 or

moreTotal

Soccer Players 45 25 11 10 91

Non-Soccer Players

68 15 8 5 96

Non-Athletes 45 5 3 0 53

Total 158 45 22 15 240

State the hypotheses.

H0: Proportions in each response category (number of concussions) are the same for all three groups

Ha: Category proportions are not all the same for all three groups

Df = (2)(3) = 6

To find df count the number of rows and columns – not including the

totals!df = (number of rows – 1)(number of columns

– 1)

Another way to find df – you can also cover one row and one column, then count the number of cells left

(not including totals)

Page 16: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Number of Concussions

0 1 23 or

moreTotal

Soccer Players 45 (59.9)

25 (17.1) 11 (8.3 10 (5.7) 91

Non-Soccer Players

68 (63.2)

15 (18.0) 8 (8.8) 5 (6.0) 96

Non-Athletes 45 (34.9)

5 (10.0) 3 (4.9) 0 (3.3) 53

Total 158 45 22 15 240

Number of Concussions

0 12 or

moreTotal

Soccer Players45 (59.9) 25 (17.1)

21 (14.0)

91

Non-Soccer Players68 (63.2) 15 (18.0)

13 (14.8)

96

Non-Athletes 45 (34.9) 5 (10.0) 3 (8.2) 53

Total 158 45 22 240

Soccer Players Continued . . .

Expected counts are shown in the parentheses

next to the observed counts.

df = 4

Test Statistic: Notice that NOT all the expected counts are at

least 5.

So combine the column for 2 concussions and the

column for 3 or more concussions.

This combined table has a df = (2)(2) = 4.

6.202.8

)2.83(...

5.59)9.5945( 22

2

X

P-value < .001 = .05

Page 17: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Number of Concussions

0 12 or

moreTotal

Soccer Players45 (59.9) 25 (17.1)

21 (14.0)

91

Non-Soccer Players68 (63.2) 15 (18.0)

13 (14.8)

96

Non-Athletes 45 (34.9) 5 (10.0) 3 (8.2) 53

Total 158 45 22 240

Soccer Players Continued . . .

Since the P-value < , we reject H0. There is strong evidence to suggest that the category

proportions for the number of concussions is not the same

for the 3 groups.Is that all I can say – that there is a difference in

proportions for the groups?

We can look at the chi-square contributions – which of the

cells above have the greatest contributions to the value of

the X2 statistic?

These cells had the largest contributions to the X2 test

statistic.

Page 18: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

X2 Test for Independence

Null Hypothesis: H0: The two variables are independent

Alternative Hypothesis:Ha: The two variables are not independent

Test Statistic:

cells all

22

count cell expectedcount cell expected - count cell observed

X

The 2 Test for Independence is used to analyze bivariate

categorical data from a single sample.

Page 19: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

X2 Test for Independence Continued . . .

Expected Counts: (assuming H0 is true)

P-value: When H0 is true and assumptions for X2 test are satisfied, X2 has approximately a chi-square distribution with df = (number of rows – 1)(number of columns – 1). The P-value associated with the computed test statistic value is the area to the right ofX under the appropriate chi-square curve.

total grandtotal) marginal umntotal)(col marginal (row

counts cell expected

Page 20: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

X2 Test for Independence Continued . . .

Assumptions:1) The observed counts are based on data

from a random sample. 2) The sample size is large: all expected cell

counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.

Page 21: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

The paper “Contemporary College Students and Body Piercing” (Journal of Adolescent Health, 2004) described a survey of 450

undergraduate students at a state university in the southwestern region of the United States. Each student in the sample was classified according to class standing (freshman, sophomore, junior, senior) and body art category (body piercing only, tattoos only, both tattoos and body piercing, no body art). Is there evidence that there is an association between class standing and response to the body art question? Use = .01.

Body Piercing

OnlyTattoos

Only

Both Body Piercing

and Tattoos

No Body Art

Freshman 61 7 14 86

Sophomore 43 11 10 64

Junior 20 9 7 43

Senior 21 17 23 54

State the hypotheses.

Page 22: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Body Art Continued . . .

Body Piercing

OnlyTattoos

Only

Both Body Piercing

and Tattoos

No Body Art

Freshman 61 7 14 86

Sophomore 43 11 10 64

Junior 20 9 7 43

Senior 21 17 23 54

H0: class standing and body art category are independent

Ha: class standing and body art category are not independent

df = 9

Assuming H0 is true, what are the expected counts?

Body Piercing

OnlyTattoos

Only

Both Body Piercing

and Tattoos

No Body Art

Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7)

Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5)

Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8)

Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0)

How many degrees of freedom does this two-

way table have?

Page 23: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Body Art Continued . . .

Test Statistic:

P-value < .001 = .01

Body Piercing

OnlyTattoos

Only

Both Body Piercing

and Tattoos

No Body Art

Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7)

Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5)

Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8)

Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0)

48.290.58

)0.5854(...

7.49)7.4961( 22

2

X

Page 24: Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Body Art Continued . . .

Since the P-value < , we reject H0. There is sufficient evidence to suggest that class standing and the body art category are associated.

Body Piercing

OnlyTattoos

Only

Both Body Piercing

and Tattoos

No Body Art

Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7)

Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5)

Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8)

Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0)

Which cell contributes the most to the X2 test

statistic?

Seniors having both body piercing and tattoos

contribute the most to the X2 statistic.