31
Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola

Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Embed Size (px)

Citation preview

Page 1: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-1Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Lecture Slides

Elementary Statistics Twelfth Edition

and the Triola Statistics Series

by Mario F. Triola

Page 2: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-2Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Chapter 11Goodness-of-Fit and Contingency Tables

11-1 Review and Preview

11-2 Goodness-of-Fit

11-3 Contingency Tables

Page 3: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-3Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Key ConceptIn this section we consider contingency tables (or two-way frequency tables), which include frequency counts for categorical data arranged in a table with a least two rows and at least two columns.

In Part 1, we present a method for testing the claim that the row and column variables are independent of each other.

In Part 2, we will consider three variations of the basic method presented in Part 1: (1) test of homogeneity, (2) Fisher exact test, and (3) McNemar’s test for matched pairs.

Page 4: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-4Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Part 1: Basic Concepts of Testing for Independence

Page 5: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-5Copyright © 2014, 2012, 2010 Pearson Education, Inc.

A contingency table (or two-way frequency table) is a table in which frequencies correspond to two variables.

(One variable is used to categorize rows, and a second variable is used to categorize columns.)

Contingency tables have at least two rows and at least two columns.

Definition

Page 6: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-6Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Below is a contingency table summarizing the results of foot procedures as a success or failure based different treatments.

Example

Page 7: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-7Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Test of Independence

A test of independence tests the null hypothesis that in a contingency table, the row and column variables are independent.

Definition

Page 8: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-8Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Notation

O represents the observed frequency in a cell of a contingency table.

E represents the expected frequency in a cell, found by assuming that the row and column variables are independent

r represents the number of rows in a contingency table (not including labels).

c represents the number of columns in a contingency table (not including labels).

Page 9: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-9Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Requirements

1. The sample data are randomly selected.

2. The sample data are represented as frequency counts in a two-way table.

3. For every cell in the contingency table, the expected frequency E is at least 5. (There is no requirement that every observed frequency must be at least 5. Also, there is no requirement that the population must have a normal distribution or any other specific distribution.)

Page 10: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-10Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Hypotheses and Test Statistic

0

1

: The row and column variables are independent.

: The row and column variables are dependent.

H

H

22 ( )O E

E

(row total)(column total)

(grand total)E

O is the observed frequency in a cell and E is the expected frequency in a cell.

Page 11: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-11Copyright © 2014, 2012, 2010 Pearson Education, Inc.

P-Values and Critical Values

P-Values

P-values are typically provided by technology, or a range of P-values can be found from Table A-4.

Critical Values

1. Found in Table A-4 using

degrees of freedom = (r – 1)(c – 1)

r is the number of rows and c is the number of columns

2. Tests of Independence are always right-tailed.

Page 12: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-12Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Expected FrequenciesReferring back to slide 6, the observed frequency is 54 successful surgeries.

The expected frequency is calculated using the first row total of 66, the first column total of 182, and the grand total of 253.

(row total)(column total)

(grand total)

66 18247.478

253

E

Page 13: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-13Copyright © 2014, 2012, 2010 Pearson Education, Inc.

ExampleDoes it appear that the choice of treatment affects the success of the treatment for the foot procedures?

Use a 0.05 level of significance to test the claim that success is independent of treatment group.

Page 14: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-14Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example - ContinuedRequirement Check:

1.Based on the study, we will treat the subjects as being randomly selected and randomly assigned to the different treatment groups.

2.The results are expressed in frequency counts.

3.The expected frequencies are all over 5.

The requirements are all satisfied.

Page 15: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-15Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example - ContinuedThe hypotheses are:

The significance level is α = 0.05.

0

1

: Success is independent of the treatment.

: Success and the treatment are dependent.

H

H

Page 16: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-16Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example - ContinuedWe use a χ2 distribution with this test statistic:

2 2 2

2 54 47.478 5 6.174

47.478 6.174

58.393

O E

E

Page 17: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-17Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example - ContinuedP-Value: If using technology, the P-value is less than 0.0001. Since this value is less than the significance level of 0.05, reject the null hypothesis.

Critical Value: The critical value of χ2 = 7.815 is found in Table A-4 with α = 0.05 and degrees of freedom of

Because the test statistic does fall in the critical region, we reject the null hypothesis.

A graphic of the chi-square distribution is on the next slide.

1 1 4 1 2 1 3r c

Page 18: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-18Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example - Continued

Page 19: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-19Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example - ContinuedInterpretation:

It appears that success is dependent on the treatment.

Although this test does not tell us which treatment is best, we can see that the success rates of 81.8%, 44.6%, 95.9%, and 77.3% suggest that the best treatment is to use a non-weight-bearing cast for 6 weeks.

Page 20: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-20Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Relationships Among Key Components in Test of Independence

Page 21: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-21Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Part 2: Test of Homogeneity and the Fisher Exact Test

Page 22: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-22Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Definition

Test of Homogeneity

In a test of homogeneity, we test the claim that different populations have the same proportions of some characteristics.

Page 23: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-23Copyright © 2014, 2012, 2010 Pearson Education, Inc.

How to Distinguish Between a Test of Homogeneity

and a Test for Independence

In a typical test of independence, sample subjects are randomly selected from one population and values of two different variables are observed.

In a test of homogeneity, subjects are randomly selected from the different populations separately.

Page 24: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-24Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example

We previously tested for independence between foot treatment and success.

If we want to use the same data in a test of the null hypothesis that the four populations corresponding to the four different treatment groups have the same proportion of success, we could use the chi-square test of homogeneity.

The test statistic, critical value, and P-value are the same as those found before, and we should reject the null hypothesis that the four treatments have the same success rate.

Page 25: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-25Copyright © 2014, 2012, 2010 Pearson Education, Inc.

The procedures for testing hypotheses with contingency tables with two rows and two columns (2 2) have the requirement that every cell must have an expected frequency of at least 5.

This requirement is necessary for the χ2 distribution to be a suitable approximation to the exact distribution of the test statistic.

Fisher Exact Test

Page 26: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-26Copyright © 2014, 2012, 2010 Pearson Education, Inc.

The Fisher exact test is often used for a 2 X 2 contingency table with one or more expected frequencies that are below 5.

The Fisher exact test provides an exact P-value and does not require an approximation technique.

Because the calculations are quite complex, it’s a good idea to use computer software when using the Fisher exact test.

STATDISK, Minitab, XLSTAT, and StatCrunch all have the ability to perform the Fisher exact test.

Fisher Exact Test

Page 27: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-27Copyright © 2014, 2012, 2010 Pearson Education, Inc.

McNemar’s Test for Matched Pairs

The methods in Part 1 of this chapter are based on independent data.

For 2 X 2 tables consisting of frequency counts that result from matched pairs, we do not have independence, and for such cases, we can use McNemar’s test for matched pairs.

In this section we present the method of using McNemar’s test for testing the null hypothesis that the frequencies from the discordant (different) categories occur in the same proportion.

Page 28: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-28Copyright © 2014, 2012, 2010 Pearson Education, Inc.

McNemar’s Test for Matched Pairs

McNemar’s test requires two frequency counts from discordant (different) pairs.

P-values are typically provided by software, and critical values can be found in Table A-4 with 1 degree of freedom.

McNemar’s test of the null hypothesis that the frequencies from the discordant categories occur in the same proportion.

Page 29: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-29Copyright © 2014, 2012, 2010 Pearson Education, Inc.

ExampleA randomized controlled trial was designed to test the effectiveness of hip protectors in preventing hip fractures in the elderly.

Nursing home residents each wore protection on one hip, but not the other.

McNemar’s test can be used to test the null hypothesis that the following two proportions are the same:

•The proportion of subjects with no hip fracture on the protected hip and a hip fracture on the unprotected hip.

•The proportion of subjects with a hi fracture on the protected hip and no hip fracture on the unprotected hip.

Page 30: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-30Copyright © 2014, 2012, 2010 Pearson Education, Inc.

ExampleThe test statistic can be calculated from the data table below:

2 2

21 10 15 1

0.64010 15

b c

b c

Page 31: Section 11.3-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 11.3-31Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example - ContinuedWith a 0.05 level of significance and one degree of freedom, the critical value for the right-tailed test is χ2 = 3.841.

The test statistic does not exceed the critical value, so we fail to reject the null hypothesis.

The proportion of hip fractures with the protectors worn is not significantly different from the proportion of hip fractures without the protectors worn.

The hip protectors do not appear to be effective in preventing hip fractures.