Upload
aprikumt
View
57
Download
0
Embed Size (px)
Citation preview
IBM- 09: Six Sigma – Tools and Techniques
Dr. A. Ramesh
Department of Management Studies
Indian Institute of Technology Roorkee
χ2 Test of Independence
χχχχ2 Test of Independence
Used to analyze the frequencies of two variables with multiple categories to determine whether the two variables are independent.
Qualitative Variables
Nominal Data
χχχχ2 Test of Independence: Investment Example
• In which region of the country do you reside?A. Northeast B. Midwest C. South D. West
• Which type of financial investment are you most likely to make today?
E. Stocks F. Bonds G. Treasury bills
Type of financialInvestment
E F G
A O13 nA
Geographic B nB
Region C nC
D nD
nE nF nG N
Contingency Table
χχχχ2 Test of Independence: Investment Example
Type of Financial Investment
E F G
A e12 nA
Geographic B nB
Region C nC
D nD
nE nF nG N
Contingency Table
( ) ( ) ( )
If A and F are independent,
P A F∩ = ⋅P A P F
( ) ( )
( )
P AN
P FN
P A FN N
A F
A F
n n
n n
= =
∩ = ⋅
( )AF
A F
A F
e
n n
n n
N P A F
NN N
N
= ⋅ ∩
= ⋅
=⋅
χχχχ2 Test of Independence: Formulas
(((( ))))(((( ))))ij
i j
en n
N
where
====
====
====
: i = the row
j = the columnn
the total of row i
the total of column j
N = the total of all frequencies
i
j
nn
(((( ))))2
2
χχχχ ====−−−−
∑∑∑∑∑∑∑∑o e
where
f f
fe
: df = (r - 1)(c - 1) r = the numberr of rowsc = the numberr of columns
ExpectedFrequencies
Calculated χχχχ2
(Observed χχχχ2)
Type of Gasoline
Income Regular PremiumExtra
PremiumLess than $30,000
$30,000 to $49,999$50,000 to $99,000At least $100,000
r = 4 c = 3
χχχχ2 Test of Independence: Gasoline
Preference Versus Income Category
income oft independen
not is gasoline of Type :H
income oft independen
is gasoline of Type :H
a
o
( )( )
( )( )
α
χ
=
= − −
= − −
=
=
.
.. ,
01
1 1
4 1 3 1
6
16 81201 6
2
df r c
If reject H .
If do not reject H .
Cal
2
o
Cal
2
o
χ
χ
>
≤
16 812
16 812
. ,
. ,
Gasoline Preference Versus Income
Category: Observed Frequencies
Type of
Gasoline
Income Regular Premium
Extra
Premium
Less than $30,000 85 16 6 107
$30,000 to $49,999 102 27 13 142
$50,000 to $99,000 36 22 15 73
At least $100,000 15 23 25 63
238 88 59 385
Gasoline Preference Versus Income
Category: Expected Frequencies
Type of Gasoline
Income Regular Premium
Extra
Premium
Less than $30,000 (66.15) (24.46) (16.40)
85 16 6 107
$30,000 to $49,999 (87.78) (32.46) (21.76)
102 27 13 142
$50,000 to $99,000 (45.13) (16.69) (11.19)
36 22 15 73
At least $100,000 (38.95) (14.40) (9.65)
15 23 25 63
238 88 59 385
(((( ))))(((( ))))
(((( ))))(((( ))))
(((( ))))(((( ))))
(((( ))))(((( ))))
ij
i j
en n
e
e
e
N====
====
====
====
====
====
====
11
12
13
107 238
385
66 15
107 88
385
24 46
107 59
385
16 40
.
.
.
(((( ))))
(((( )))) (((( )))) (((( ))))
(((( )))) (((( )))) (((( ))))
(((( )))) (((( )))) (((( ))))
(((( )))) (((( )))) (((( ))))
2
2
88 6615 16 24 46 6 16 40
102 87 78 27 32 46 13 2176
36 4513 22 16 69 15 1119
15 38 95 23 14 40 25 9 65
66 15 24 46 16 40
87 78 32 46 21 76
4513 16 69 11 19
38 95 14 40 9 6570 78
χχχχ ====
==== ++++ ++++ ++++
++++ ++++ ++++
++++ ++++ ++++
++++ ++++
====
−−−−∑∑∑∑∑∑∑∑
−−−− −−−− −−−−
−−−− −−−− −−−−
−−−− −−−− −−−−
−−−− −−−− −−−−
o ef f
fe
2 2 2
2 2 2
2 2 2
2 2 2
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . ..
Gasoline Preference Versus Income
Category: χχχχ2 Calculation
Gasoline Preference Versus Income
Category: Conclusion
0.01
df = 6
16.812
Non rejection
region
Cal
2
70 78 16812χ = >. . , reject H .o
Contingency Tables
Contingency Tables
� Useful in situations involving multiple
population proportions
� Used to classify sample observations
according to two or more characteristics
� Also called a cross-classification table.
Contingency Table Example
Hand Preference vs. Gender
Dominant Hand: Left vs. Right
Gender: Male vs. Female
� 2 categories for each variable, so the
table is called a 2 x 2 table
� Suppose we examine a sample of
300 college students
Contingency Table Example
Sample results organized in a contingency table:
Hand
Preference
Gender
Female Male
Left 12 24 36
Right 108 156 264
120 180 300
120 Females, 12 were
left handed
180 Males, 24 were
left handed
sample size = n = 300:
Contingency Table Example
� If H0 is true, then the proportion of left-handed females
should be the same as the proportion of left-handed males.
� The two proportions above should be the same as the
proportion of left-handed people overall.
H0: π1 = π2 (Proportion of females who are left
handed is equal to the proportion of
males who are left handed)
H1: π1 ≠ π2 (The two proportions are not the same –
Hand preference is not independent
of gender)
The Chi-Square Test Statistic
where:
fo = observed frequency in a particular cell
fe = expected frequency in a particular cell if H0 is true
χ2 for the 2 x 2 case has 1 degree of freedom
∑−
=cells all e
2
eo2
f
)f(fχ
The Chi-square test statistic is:
Assumed: each cell in the contingency table has expected frequency of at least 5
The Chi-Square Test Statistic
Decision Rule:
If χ2 > χ2U, reject H0,
otherwise, do not reject
H0
The χ2 test statistic approximately follows a chi-square
distribution with one degree of freedom
χχχχ2222
χ2U
0
α
Reject H0Do not reject H0
Observed vs. Expected Frequencies
Hand
Preference
Gender
Female Male
LeftObserved = 12
Expected = 14.4
Observed = 24
Expected = 21.636
RightObserved = 108
Expected = 105.6
Observed = 156
Expected = 158.4264
120 180 300
The Chi-Square Test Statistic
Hand
Preference
Gender
Female Male
LeftObserved = 12
Expected = 14.4
Observed = 24
Expected = 21.636
RightObserved = 108
Expected = 105.6
Observed = 156
Expected = 158.4264
120 180 300
7576.04.158
)4.158156(
6.21
)6.2124(
6.105
)6.105108(
4.14
)4.1412(
)(
2222
cells all
22
=−
+−
+−
+−
=
−= ∑
e
eo
f
ffχ
The test statistic is:
The Chi-Square Test Statistic
Decision Rule:
If χ2 > 3.841, reject H0, otherwise, do
not reject H0
3.841 d.f. 1 with , 7576.0 is statistic test The 22== Uχχ
Here,
χ2 = 0..7576 < χ2U = 3.841,
so you do not reject H0 and
conclude that there is
insufficient evidence that the
two proportions are different.
χχχχ2222
χ2U=3.841
0
α=.05
Reject H0Do not reject H0
χ2 Test for The Differences Among More Than Two Proportions
� Extend the χ2 test to the case with more than two
independent populations:
H0: π1 = π2 = … = πc
H1: Not all of the πj are equal (j = 1, 2, …, c)
The Chi-Square Test Statistic
where:
� fo = observed frequency in a particular cell of the 2 x c table
� fe = expected frequency in a particular cell if H0 is true
� χ2 for the 2 x c case has (2-1)(c-1) = c - 1 degrees of freedom
Assumed: each cell in the contingency table has expected frequency of
at least 5
∑−
=cells all
22 )(
e
eo
f
ffχ
The Chi-square test statistic is:
χ2 Test with More Than Two Proportions: Example
The sharing of patient records is a controversial issue in
health care. A survey of 500 respondents asked
whether they objected to their records being shared by
insurance companies, by pharmacies, and by medical
researchers. The results are summarized on the
following table:
χ2 Test with More Than Two Proportions: Example
Object to
Record
Sharing
Organization
Insurance
Companies
Pharmacies Medical
Researchers
Yes 410 295 335
No 90 205 165
χ2 Test with More Than Two Proportions: Example
6933.0500500500
335295410
...
...
21
21 =++
++=
+++
+++=
c
c
nnn
XXXp
The overall
proportion is:
Object to
Record
Sharing
Organization
Insurance
Companies
Pharmacies Medical
Researchers
Yes fo = 410
fe = 346.667
fo = 295
fe = 346.667
fo = 335
fe = 346.667
No fo = 90
fe = 153.333
fo = 205
fe = 153.333
fo = 165
fe = 153.333
χ2 Test with More Than Two Proportions: Example
Object
to
Record
Sharing
Organization
Insurance
Companies
Pharmacies Medical
Researchers
Yes
No
( )700.7
2
=−
e
eo
f
ff
( )159.26
2
=−
e
eo
f
ff
( )571.11
2
=−
e
eo
f
ff ( )3926.0
2
=−
e
eo
f
ff
( )409.17
2
=−
e
eo
f
ff ( )888.0
2
=−
e
eo
f
ff
1196.64)(
cells all
22
=−
= ∑e
eo
f
ffχThe Chi-square test statistic is:
χ2 Test with More Than Two Proportions: Example
Decision Rule:
If χ2 > χ2U, reject H0,
otherwise, do not reject H0
χ2U = 5.991 is from the chi-
square distribution with 2
degrees of freedom.
H0: π1 = π2 = π3
H1: Not all of the πj are equal (j = 1, 2, 3)
Conclusion: Since 64.1196 > 5.991, you reject H0 and you
conclude that at least one proportion of respondents who object to
their records being shared is different across the three
organizations