Upload
cuthbert-jackson
View
214
Download
2
Embed Size (px)
Citation preview
Section 12.2: Tests for Homogeneity and Independence
in a Two-Way Table
• Two-way frequency table – bivariate categorical data can most easily be summarized by the two- way frequency table.
• Marginal Totals – obtained by adding the observed cell counts in each row and also in each column of the table.
• Grand Total – The number of observations in the bivariate data set (most cases it is the sample size)
• An example of a two-way frequency table with the marginal and grand totals.
Contacts Glasses NoneRow Marginal
Total
Female 5 9 11 25Male 5 22 27 54
Column Marginal Total
10 31 38 79
• To compare two or more groups on the basis of a categorical variable, calculate an expected cell count for each cell by selecting the corresponding row and column marginal totals and then computing
totalgrand
total)marginallumn total)(comarginal (row count cell expected
• Null Hypothesis: H0: The true category proportions are the same for all the populations (homogeneity of populations)
• Alternative Hypothesis: Ha: The true category proportions are not all the same for all the populations.
• Test Statistic: X2
• P-Values: When H0 is true, X2 has approximately a chi-square distribution with df = (number of rows – 1)(number of columns – 1).
• Assumptions:1. The data consist of independently chosen random samples.2. The sample size is large. All expected counts are at least 5.
Example
• The data on drinking behavior for independently chosen random samples of male and female students are similar to data that appeared an article. Does there appear to be a gender difference with respect to drinking behavior? (Note: Low = 1-7 drinks/week, moderate = 8-24 drinks/week, high = 25 or more drinks/week.)
Drinking Level
Male Female Row Marginal
Total
None 140
(158.6)
186
(167.4)
326
Low 478
(554.0)
661
(585.0)
1139
Moderate 300
(230.1)
173
(242.9)
473
High 63
(38.4)
16
(40.6)
79
Column Marginal
Total
981 1036 2017
• The relevant hypotheses are– H0: True proportions for the four drinking levels are
the same for males and females– Ha: H0 is not true
• Significance Level: α = .01
cells all
22
count cell expected
count) cell expected -count cell observed(X :StatisitcTest
• Assumptions: Table 12.5 contains the computed expected counts, all of which are greater than 5. The data consist of independently chosen random samples.
6.966.40
)6.4016(...
6.158
)6.158140(X :nCalculatio
222
• P-value: The table has 4 rows and 2 columns, so df = (4 – 1)(2 – 1) = 3. The computed value of X2 is greater than the largest entry in the 3 – df column of Appendix Table 9, so P-value < .001
• Conclusion: Because P-value ≤ α, H0 is rejected. The data indicate that males and females differ with respect to drinking level.
Independence
• P(A and B) = P(A)P(B)
(proportion of individuals in a particular category combination) = (proportion in specified category of first variable) (proportion in specified category of second variable)
X2 Test for Independence
• Null Hypothesis: H0: The two variables are independent
• Alternative Hypothesis: Ha: The two variables are not independent.
Example
• An article examined the relationship between gender and contraceptive use by sexually active teens. Each person in a random sample of sexually active teens was classified according to gender and contraceptive use (with three categories: rarely or never use, use sometimes or most of the time, and always use).
Observed Counts
Contraceptive Use
Female Male Row Marginal Total
Rarely/Never 210 350 560
Sometimes 190 320 510
Always 400 530 930
Column Marginal Total
800 1200 2000
• The authors were interested in determining whether there is an association between gender and contraceptive use. Using a .05 significance level, we will test – H0: Gender and contraceptive use are
independent
– Ha: Gender and contraceptive use are not independent
• Significance Level: α = .05
cells all
22
count cell expected
count) cell expected -count cell observed(X :Statistic Test
0.5582000
)1200)(930(23
0.3722000
)800)(930(13
0.3062000
)1200)(510(22
0.2042000
)800)(510(12
0.3362000
)1200)(560(21
0.2242000
(560)(800)11
Count Cell ExpectedColumn CellRow Cell
:counts cell expected thecomputefirst must we
s,assumption check thecan weBefore :sAssumption
All expected cell counts are greater than 5, so we can continue the test.
572.6558
)558530(...
224
)224210(X :nCalculatio
222
• P-value: The table has 3 rows and 2 columns, so df = (3-1)(2-1) = 2. The entry closest to 6.572 is 6.70, so the approximate P-value for this test is P-value ≈ .035
• Conclusion: Because P-value ≤ α, we reject H0 and conclude that there is an association between gender and contraceptive use.