71
Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures

Stats chapter 14

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Stats chapter 14

Chapter 14

Inference for Distributions of Categorical Variables: Chi-Square Procedures

Page 2: Stats chapter 14

14.1 TEST FOR GOODNESS OF FIT

Page 3: Stats chapter 14

The problem

• Suppose we open a bag of M&M’s and count the number of M&M’s of each color.

• How would we know if our color counts are at normal levels?

• How would we know if our color counts were abnormal?

Page 4: Stats chapter 14

Chi-Square Distribution

• When we want to test the proportion of many counts (i.e. a two-way table or an array), we need to use a new distribution-

• The Chi-Square Distribution (Chi = = “KAI”)• As you might suspect, this is another (the last

of the year) PHANTOMS procedure.• The 2 distribution is found at table D and the

[2nd] -> [Vars] (DIsT) menu on your calculator

Page 5: Stats chapter 14

Chi-Square Distribution

• When we want to test the proportion of many counts (i.e. a two-way table or an array), we need to use a new distribution-

• The Chi-Square Distribution (Chi = = “KAI”)• As you might suspect, this is another (the last

of the year) PHANTOMS procedure.• The 2 distribution is found at table D and the

[2nd] -> [Vars] (DIsT) menu on your calculator

Page 6: Stats chapter 14

The 2 distribution

• Like the t-distribution, the 2 distribution is variable. i.e. the distribution also has degrees of freedom.

• It is single peaked, right skewed.• As the df increases, the peak decreases in

height, moves to the right and becomes more symmetric/Normal.

• As df increases, the 2 statistic needed for statistically significant results also increases

Page 7: Stats chapter 14

The 2 distribution

Page 8: Stats chapter 14

Chi-Square Goodness of Fit

• When we want to check whether a distribution fits a hypothesized distribution, we use the “2 goodness of fit test”

• This is procedure is frequently used to see if a distribution is not in equal proportions

• No, this will not be much different than what we have already been doing for the last 3 chapters.

Page 9: Stats chapter 14

2 GOF Test

ParameterUnlike previous tests, you will not need to state a or a p.You need to state where the distribution come from.EXWe are investigating the proportions of all 15 oz. bags ofchocolate M&M’s of M&M’s

Page 10: Stats chapter 14

2 GOF Test

HypothesesThere are two styles for stating hypothesis

Style 1In this style, you will refer to a written table-or- state that all proportions are “equal”H0: the proportions of M&M’s are the same as the table providedHa: at least one color count is different than the table

H0: the proportions of accidents for each day is equalHa: at least one day has a count that is not equal

Page 11: Stats chapter 14

2 GOF Test

Hypotheses (cont.)Style 2In this style, you will write out the expected proportionsH0: pred = pblue = pyel = pbrn = pgrn = porg = 1/6Ha: at least one probability is different that stated above.

Page 12: Stats chapter 14

2 GOF Test

Hypotheses (cont)Notice that the alternative hypothesis in each case is that at least one proportion is different than hypothesized

Page 13: Stats chapter 14

2 GOF Test

Assumptions1. All expected cell counts are greater than 12. No more than 20% of the cell counts is less than 5(that’s a whole lot easier, yeah?)

Name of the Test“2 Goodness Of Fit Test”

Page 14: Stats chapter 14

2 GOF Test

Test StatisticObserved Count (O) is the count for each cell that we observed. The sum of each observed count is ‘n’

Expected Count (E) is the expected frequency of each cell times the sample size ‘n’

Page 15: Stats chapter 14

2 GOF Test

Test Statistic (cont)If we opened up a bag of M&M’s and found the following count:

Red Blue Brwn Yel Grn Orng

O : 5 3 10 6 4 3 n = 31E: 5.17 5.17 5.17 5.17 5.17 5.17

Note: expected counts are all equal to 31/6We are testing to see if M&M’s come in equal proportions

Page 16: Stats chapter 14

2 GOF Test

Test Statistic (cont)The test statistic is 2 (“kai squared”):

Degrees of freedom (df) = # of classes – 1

2 2 2 2

1 1 2 2 3 32

1 2 3

2

... n n

n

O E O E O E O E

E E E E

O E

E

Page 17: Stats chapter 14

2 GOF Test

Test Statistic (cont.)

2 2 2 2

2

2 2

5 5.17 3 5.17 10 5.17 6 5.17

5.17 5.17 5.17 5.17

4 5.17 3 5.17

5.17 5.17

2 6.739

6 1 5df

Page 18: Stats chapter 14

2 GOF Test

P Valuep val = P(2(df) > test statistic )on the calculator, [2nd] -> [VARS] (DIST) -> 2-cdfUsage: “2-cdf( lower, upper, df )

pval = P(2(5) > 6.739)

Page 19: Stats chapter 14

2 GOF Test

P Valuep val = P(2(df) > test statistic )on the calculator, [2nd] -> [VARS] (DIST) -> 2-cdfUsage: “2-cdf( lower, upper, df )

pval = P(2(5) > 6.739)

Page 20: Stats chapter 14

2 GOF Test

P Valuep val = P(2(df) > test statistic )on the calculator, [2nd] -> [VARS] (DIST) -> 2-cdfUsage: “2-cdf( lower, upper, df )

pval = P(2(5) > 6.739)

Page 21: Stats chapter 14

2 GOF Test

P Valuep val = P(2(df) > test statistic )on the calculator, [2nd] -> [VARS] (DIST) -> 2-cdfUsage: “2-cdf( lower, upper, df )

pval = P(2(5) > 6.739)pval = 0.2409

Page 22: Stats chapter 14

2 GOF Test

DecisionSimilarly to the other tests, reject the null hypothesis when the p-value is below the accepted level

SummaryUse the same 3 part summary:1) Interpret the p value w.r.t. sampling distribution2) Make decision with reference to an alpha level3) Summarize the results in context of the problem

Page 23: Stats chapter 14

2 GOF Test

Summary (cont.)“The given proportions in a sample of 31 would appear in approximately 24% of all random samples.”“Because this p value is greater than any acceptable alpha levels, we fail to reject the null hypothesis.”“We do not have sufficient evidence to conclude that the color distribution in M&M’s is not equally distributed”

Page 24: Stats chapter 14

Calculator methods

TI83/84

Page 25: Stats chapter 14

Calculator methods

TI83/84Begin by storing the observed counts in “L1”Store the expected counts in “L2”

Page 26: Stats chapter 14

Calculator methods

TI83/84Begin by storing the observed counts in “L1”Store the expected counts in “L2”

Page 27: Stats chapter 14

Calculator methods

TI83/84Begin by storing the observed counts in “L1”Store the expected counts in “L2”From the Home Screen evaluate:“sum((L1 – L2)2/L2)”

Page 28: Stats chapter 14

Calculator methods

TI83/84Begin by storing the observed counts in “L1”Store the expected counts in “L2”From the Home Screen evaluate:“sum((L1 – L2)2/L2)”

Page 29: Stats chapter 14

Calculator methods

TI83/84Begin by storing the observed counts in “L1”Store the expected counts in “L2”From the Home Screen evaluate:“sum((L1 – L2)2/L2)”

Page 30: Stats chapter 14

Calculator methods

TI83/84Begin by storing the observed counts in “L1”Store the expected counts in “L2”From the Home Screen evaluate:“sum((L1 – L2)2/L2)”This is the value of 2.

Page 31: Stats chapter 14

Calculator methods

TI83/84Begin by storing the observed counts in “L1”Store the expected counts in “L2”From the Home Screen evaluate:“sum((L1 – L2)2/L2)”This is the value of 2.Use the 2-cdf from the “Dist Menu” to find p-value“2-cdf (lower, upper, df)

Page 32: Stats chapter 14

Calculator methods

TI83/84Begin by storing the observed counts in “L1”Store the expected counts in “L2”From the Home Screen evaluate:“sum((L1 – L2)2/L2)”This is the value of 2.Use the 2-cdf from the “Dist Menu” to find p-value“2-cdf (lower, upper, df)

Page 33: Stats chapter 14

14.2 INFERENCE FOR TWO-WAY TABLES

Page 34: Stats chapter 14

Comparing two-groupsWine No Music French Music Italian Music Total

French 30 39 30 99

Italian 11 1 19 31

Other 43 35 35 113

Total 84 75 84 243

•The table above compares the background music with the # of bottles of wine purchased.•Not that information is presented in a two-way table with marginal distributions•Is there a relationship between these two categorical variables??

Page 35: Stats chapter 14

Comparing two-groups

• The test for relationship presented in the preceding page is a 2 test.

• In particular, this is a 2 test for homogeneity. It measures whether any one expected cell count is drastically different than the observed cell count.

Page 36: Stats chapter 14

Expected cell count for 2-way tables

column totalExpected Cell count row total

total

Page 37: Stats chapter 14

Expected cell count for 2-way tables

column totalExpected Cell count row total

total

% of population that are in the column

Page 38: Stats chapter 14

Expected cell count for 2-way tables

column totalExpected Cell count row total

total

Count of cell if the rows “obeyed”the column percentages

Page 39: Stats chapter 14

Expected cell count for 2-way tables

column totalExpected Cell count row total

total

Even for a small table, these calculations get cumbersome

Page 40: Stats chapter 14

Expected CountsWine No Music French Music Italian Music Total

French 39 30

Italian 11 1 19 31

Other 43 35 35 113

Total 75 84

30

84 243

99

Expected =x Column TotalRow total

Total

Page 41: Stats chapter 14

Expected CountsWine No Music French Music Italian Music Total

French 39 30

Italian 11 1 19 31

Other 43 35 35 113

Total 75 84

30

84 243

99

99Expected =

x 84

243

Page 42: Stats chapter 14

Expected CountsWine No Music French Music Italian Music Total

French 39 30

Italian 11 1 19 31

Other 43 35 35 113

Total 75 84

30

84 243

99

99Expected =

x 84

243= 34.22

Page 43: Stats chapter 14

Expected CountsWine No Music French Music Italian Music Total

French 39 30

Italian 11 1 19 31

Other 43 35 35 113

Total 75 8484 243

99

99Expected =

x 84

243= 34.22

34.44

Page 44: Stats chapter 14

Expected CountsWine No Music French Music Italian Music Total

French 39 30

Italian 11 1 19 31

Other 43 35 35 113

Total 75 8484 243

99

99Expected =

x 84

243= 34.22

34.44

Page 45: Stats chapter 14

Expected CountsWine No Music French Music Italian Music Total

French 39 30

Italian 11 1 19 31

Other 43 35 35 113

Total 75 8484 243

99

99Expected =

x 84

243= 34.22

34.44

Let’s start with the PHANTOMS procedure

Page 46: Stats chapter 14

2 Test for Homogeneity

ParameterState where each proportion comes from and what each count represents

“We are investigating the proportions of customers in the store who purchase French, Italian or other wine while listening to French, Italian or other music.”

Page 47: Stats chapter 14

2 Test for Homogeneity

HypothesesThe null hypothesis is always “the distributions of (group A) are the same in all population of (group B)”The alternative hypothesis is always “the distribution of (group A) are not all the same

“H0: the distributions of wine types are the same in all populations of music typesHa: the distributions of wine types are not all the same”

Page 48: Stats chapter 14

2 Test for Homogeneity

Assumptions(1) No more than 20% of the expected cell counts are less than 5(2) All expected cell counts are > 1(3) In a 2 x 2 table, all expected counts are greater than 5

Page 49: Stats chapter 14

2 Test for Homogeneity

• “All expected cell counts are greater than 5”

Wine No Music French Music Italian Music Total

French 34.22 30.56 34.22 99

Italian 10.72 9.57 10.72 31

Other 39.06 34.88 39.06 113

Total 84 75 84 243

Page 50: Stats chapter 14

2 Test for Homogeneity

Test Statistic

2

2 O E

E

2 2 2

2 30 34.22 39 30.56 35 39.06...

34.22 30.56 39.06

2 18.279

# rows - 1 # columns - 1df

3 - 1 3 - 1 4df

Page 51: Stats chapter 14

2 Test for Homogeneity

P Value

Decision

2 2P Value = (test statistic)P df df

2P Value = 4 18.279P

Page 52: Stats chapter 14

2 Test for Homogeneity

P Value

Decision

2 2P Value = (test statistic)P df df

2P Value = 4 18.279P

Page 53: Stats chapter 14

2 Test for Homogeneity

P Value

DecisionReject null hypothesis

2 2P Value = (test statistic)P df df

2P Value = 4 18.279P

P Value =0.00109

Page 54: Stats chapter 14

2 Test for Homogeneity

SummaryApproximately 0.1% of the time, a random sample of 243 will produce the distribution given.Because the p value is less than an of 0.05, we will reject the null hypothesis.We have sufficient evidence at the 5% significance level to conclude that the distribution of wine types purchased is not the same in all music types.

Page 55: Stats chapter 14

Calculator Methods

Methods on the TI84

Page 56: Stats chapter 14

Calculator Methods

Methods on the TI84Before you begin the test, you must enter the “observed counts” into MATRIX [A][2ND] -> [x-1] (MATRIX) -> “EDIT” -> [1]

Page 57: Stats chapter 14

Calculator Methods

Methods on the TI84Before you begin the test, you must enter the “observed counts” into MATRIX [A][2ND] -> [x-1] (MATRIX) -> “EDIT” -> [1]

Page 58: Stats chapter 14

Calculator Methods

Methods on the TI84Before you begin the test, you must enter the “observed counts” into MATRIX [A][2ND] -> [x-1] (MATRIX) -> “EDIT” -> [1]Input the correct matrix size and cell counts(Use [ENTER] or the Cursor Keys to switch between fields.)

Page 59: Stats chapter 14

Calculator Methods

Methods on the TI84Before you begin the test, you must enter the “observed counts” into MATRIX [A][2ND] -> [x-1] (MATRIX) -> “EDIT” -> [1]Input the correct matrix size and cell counts(Use [ENTER] or the Cursor Keys to switch between fields.)

Page 60: Stats chapter 14

Calculator Methods

Methods on the TI84 (cont.)IMPORTANT: after inputting the observed matrix, quit and go to the home screen[STAT] -> “TESTS” -> “2 Test”

Page 61: Stats chapter 14

Calculator Methods

Methods on the TI84 (cont.)IMPORTANT: after inputting the observed matrix, quit and go to the home screen[STAT] -> “TESTS” -> “2 Test”

Page 62: Stats chapter 14

Calculator Methods

Methods on the TI84 (cont.)IMPORTANT: after inputting the observed matrix, quit and go to the home screen[STAT] -> “TESTS” -> “2 Test” Ensure that “Observed” is set to [A] and“Expected” is set to [B]“Calculate”

Page 63: Stats chapter 14

Calculator Methods

Methods on the TI84 (cont.)IMPORTANT: after inputting the observed matrix, quit and go to the home screen[STAT] -> “TESTS” -> “2 Test” Ensure that “Observed” is set to [A] and“Expected” is set to [B]“Calculate”

Page 64: Stats chapter 14

Calculator Methods

Methods on the TI84 (cont.)IMPORTANT: after inputting the observed matrix, quit and go to the home screen[STAT] -> “TESTS” -> “2 Test” Ensure that “Observed” is set to [A] and“Expected” is set to [B]“Calculate”

Page 65: Stats chapter 14

Calculator Methods

Methods on the TI84 (cont.)IMPORTANT: after inputting the observed matrix, quit and go to the home screen[STAT] -> “TESTS” -> “2 Test” Ensure that “Observed” is set to [A] and“Expected” is set to [B]“Calculate”The expected cell counts will be calculated and stored in Matrix [B] (go back to the Matrix menu to see the expected Counts)

Page 66: Stats chapter 14

2 Tests

• Occasionally, you will be asked to find the cell that “contributed the most to the 2 statistic.”

• When this is asked, you must calculate the 2 statistic by hand and find the largest value of(O – E)2 / E.

• This is usually the cell that differs the most from the expected count

• Since this is a percent calculation, it is not always predictable.

Page 67: Stats chapter 14

2 Test for Independence

• A similar test for two way tables is the “2 Test for Independence” sometimes called“2 Test for Association”

• This test is asks the question, “do the two variables influence each other?”

• When there is no association, the observed two-way table is close to the expected table

Page 68: Stats chapter 14

2 Test for Independence

This test really only differs from the test for homogeneity in the hypotheses and the conclusion.

HypothesesThe null hypothesis is “there is no association between (group 1) and (group 2)”The alternative hypothesis is “there is an association between (group 1) and (group 2)”

Page 69: Stats chapter 14

2 Test for Independence

ConclusionPhrase your conclusions similar to the ones we have been constructing.When failing to reject H0:After interpreting the p value and comparing the p value to alpha, state that there is “no evidence to conclude that an association exists between (group 1) and (group 2)”

Likewise, when rejecting H0, state that “there is sufficient evidence to conclude that an association exists between (group 1) and (group 2)”

Page 70: Stats chapter 14

Assignment 14.2

• Page 877 #29, 31, 32, 33

Page 71: Stats chapter 14