18
Section 12.2: Tests for Homogeneity and Independence in a Two- Way Table

Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

Embed Size (px)

Citation preview

Page 1: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

Section 12.2: Tests for Homogeneity and Independence

in a Two-Way Table

Page 2: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

• Two-way frequency table – bivariate categorical data can most easily be summarized by the two- way frequency table.

• Marginal Totals – obtained by adding the observed cell counts in each row and also in each column of the table.

• Grand Total – The number of observations in the bivariate data set (most cases it is the sample size)

Page 3: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

• An example of a two-way frequency table with the marginal and grand totals.

Contacts Glasses NoneRow Marginal

Total

Female 5 9 11 25Male 5 22 27 54

Column Marginal Total

10 31 38 79

Page 4: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

• To compare two or more groups on the basis of a categorical variable, calculate an expected cell count for each cell by selecting the corresponding row and column marginal totals and then computing

totalgrand

total)marginallumn total)(comarginal (row count cell expected

Page 5: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

• Null Hypothesis: H0: The true category proportions are the same for all the populations (homogeneity of populations)

• Alternative Hypothesis: Ha: The true category proportions are not all the same for all the populations.

• Test Statistic: X2

• P-Values: When H0 is true, X2 has approximately a chi-square distribution with df = (number of rows – 1)(number of columns – 1).

• Assumptions:1. The data consist of independently chosen random samples.2. The sample size is large. All expected counts are at least 5.

Page 6: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

Example

• The data on drinking behavior for independently chosen random samples of male and female students are similar to data that appeared an article. Does there appear to be a gender difference with respect to drinking behavior? (Note: Low = 1-7 drinks/week, moderate = 8-24 drinks/week, high = 25 or more drinks/week.)

Page 7: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

Drinking Level

Male Female Row Marginal

Total

None 140

(158.6)

186

(167.4)

326

Low 478

(554.0)

661

(585.0)

1139

Moderate 300

(230.1)

173

(242.9)

473

High 63

(38.4)

16

(40.6)

79

Column Marginal

Total

981 1036 2017

Page 8: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

• The relevant hypotheses are– H0: True proportions for the four drinking levels are

the same for males and females– Ha: H0 is not true

• Significance Level: α = .01

cells all

22

count cell expected

count) cell expected -count cell observed(X :StatisitcTest

Page 9: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

• Assumptions: Table 12.5 contains the computed expected counts, all of which are greater than 5. The data consist of independently chosen random samples.

6.966.40

)6.4016(...

6.158

)6.158140(X :nCalculatio

222

Page 10: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

• P-value: The table has 4 rows and 2 columns, so df = (4 – 1)(2 – 1) = 3. The computed value of X2 is greater than the largest entry in the 3 – df column of Appendix Table 9, so P-value < .001

• Conclusion: Because P-value ≤ α, H0 is rejected. The data indicate that males and females differ with respect to drinking level.

Page 11: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

Independence

• P(A and B) = P(A)P(B)

(proportion of individuals in a particular category combination) = (proportion in specified category of first variable) (proportion in specified category of second variable)

Page 12: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

X2 Test for Independence

• Null Hypothesis: H0: The two variables are independent

• Alternative Hypothesis: Ha: The two variables are not independent.

Page 13: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

Example

• An article examined the relationship between gender and contraceptive use by sexually active teens. Each person in a random sample of sexually active teens was classified according to gender and contraceptive use (with three categories: rarely or never use, use sometimes or most of the time, and always use).

Page 14: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

Observed Counts

Contraceptive Use

Female Male Row Marginal Total

Rarely/Never 210 350 560

Sometimes 190 320 510

Always 400 530 930

Column Marginal Total

800 1200 2000

Page 15: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

• The authors were interested in determining whether there is an association between gender and contraceptive use. Using a .05 significance level, we will test – H0: Gender and contraceptive use are

independent

– Ha: Gender and contraceptive use are not independent

Page 16: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

• Significance Level: α = .05

cells all

22

count cell expected

count) cell expected -count cell observed(X :Statistic Test

Page 17: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

0.5582000

)1200)(930(23

0.3722000

)800)(930(13

0.3062000

)1200)(510(22

0.2042000

)800)(510(12

0.3362000

)1200)(560(21

0.2242000

(560)(800)11

Count Cell ExpectedColumn CellRow Cell

:counts cell expected thecomputefirst must we

s,assumption check thecan weBefore :sAssumption

All expected cell counts are greater than 5, so we can continue the test.

Page 18: Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table

572.6558

)558530(...

224

)224210(X :nCalculatio

222

• P-value: The table has 3 rows and 2 columns, so df = (3-1)(2-1) = 2. The entry closest to 6.572 is 6.70, so the approximate P-value for this test is P-value ≈ .035

• Conclusion: Because P-value ≤ α, we reject H0 and conclude that there is an association between gender and contraceptive use.