26
IBM- 09: Six Sigma – Tools and Techniques Dr. A. Ramesh Department of Management Studies Indian Institute of Technology Roorkee χ 2 Test of Independence

10 chi square_test-march

Embed Size (px)

Citation preview

IBM- 09: Six Sigma – Tools and Techniques

Dr. A. Ramesh

Department of Management Studies

Indian Institute of Technology Roorkee

χ2 Test of Independence

χχχχ2 Test of Independence

Used to analyze the frequencies of two variables with multiple categories to determine whether the two variables are independent.

Qualitative Variables

Nominal Data

χχχχ2 Test of Independence: Investment Example

• In which region of the country do you reside?A. Northeast B. Midwest C. South D. West

• Which type of financial investment are you most likely to make today?

E. Stocks F. Bonds G. Treasury bills

Type of financialInvestment

E F G

A O13 nA

Geographic B nB

Region C nC

D nD

nE nF nG N

Contingency Table

χχχχ2 Test of Independence: Investment Example

Type of Financial Investment

E F G

A e12 nA

Geographic B nB

Region C nC

D nD

nE nF nG N

Contingency Table

( ) ( ) ( )

If A and F are independent,

P A F∩ = ⋅P A P F

( ) ( )

( )

P AN

P FN

P A FN N

A F

A F

n n

n n

= =

∩ = ⋅

( )AF

A F

A F

e

n n

n n

N P A F

NN N

N

= ⋅ ∩

= ⋅

=⋅

χχχχ2 Test of Independence: Formulas

(((( ))))(((( ))))ij

i j

en n

N

where

====

====

====

: i = the row

j = the columnn

the total of row i

the total of column j

N = the total of all frequencies

i

j

nn

(((( ))))2

2

χχχχ ====−−−−

∑∑∑∑∑∑∑∑o e

where

f f

fe

: df = (r - 1)(c - 1) r = the numberr of rowsc = the numberr of columns

ExpectedFrequencies

Calculated χχχχ2

(Observed χχχχ2)

Apri
Highlight
Apri
Highlight

Type of Gasoline

Income Regular PremiumExtra

PremiumLess than $30,000

$30,000 to $49,999$50,000 to $99,000At least $100,000

r = 4 c = 3

χχχχ2 Test of Independence: Gasoline

Preference Versus Income Category

income oft independen

not is gasoline of Type :H

income oft independen

is gasoline of Type :H

a

o

( )( )

( )( )

α

χ

=

= − −

= − −

=

=

.

.. ,

01

1 1

4 1 3 1

6

16 81201 6

2

df r c

If reject H .

If do not reject H .

Cal

2

o

Cal

2

o

χ

χ

>

16 812

16 812

. ,

. ,

Apri
Highlight
Apri
Highlight
Apri
Highlight
Apri
Highlight

Gasoline Preference Versus Income

Category: Observed Frequencies

Type of

Gasoline

Income Regular Premium

Extra

Premium

Less than $30,000 85 16 6 107

$30,000 to $49,999 102 27 13 142

$50,000 to $99,000 36 22 15 73

At least $100,000 15 23 25 63

238 88 59 385

Gasoline Preference Versus Income

Category: Expected Frequencies

Type of Gasoline

Income Regular Premium

Extra

Premium

Less than $30,000 (66.15) (24.46) (16.40)

85 16 6 107

$30,000 to $49,999 (87.78) (32.46) (21.76)

102 27 13 142

$50,000 to $99,000 (45.13) (16.69) (11.19)

36 22 15 73

At least $100,000 (38.95) (14.40) (9.65)

15 23 25 63

238 88 59 385

(((( ))))(((( ))))

(((( ))))(((( ))))

(((( ))))(((( ))))

(((( ))))(((( ))))

ij

i j

en n

e

e

e

N====

====

====

====

====

====

====

11

12

13

107 238

385

66 15

107 88

385

24 46

107 59

385

16 40

.

.

.

(((( ))))

(((( )))) (((( )))) (((( ))))

(((( )))) (((( )))) (((( ))))

(((( )))) (((( )))) (((( ))))

(((( )))) (((( )))) (((( ))))

2

2

88 6615 16 24 46 6 16 40

102 87 78 27 32 46 13 2176

36 4513 22 16 69 15 1119

15 38 95 23 14 40 25 9 65

66 15 24 46 16 40

87 78 32 46 21 76

4513 16 69 11 19

38 95 14 40 9 6570 78

χχχχ ====

==== ++++ ++++ ++++

++++ ++++ ++++

++++ ++++ ++++

++++ ++++

====

−−−−∑∑∑∑∑∑∑∑

−−−− −−−− −−−−

−−−− −−−− −−−−

−−−− −−−− −−−−

−−−− −−−− −−−−

o ef f

fe

2 2 2

2 2 2

2 2 2

2 2 2

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . ..

Gasoline Preference Versus Income

Category: χχχχ2 Calculation

Gasoline Preference Versus Income

Category: Conclusion

0.01

df = 6

16.812

Non rejection

region

Cal

2

70 78 16812χ = >. . , reject H .o

Contingency Tables

Contingency Tables

� Useful in situations involving multiple

population proportions

� Used to classify sample observations

according to two or more characteristics

� Also called a cross-classification table.

Contingency Table Example

Hand Preference vs. Gender

Dominant Hand: Left vs. Right

Gender: Male vs. Female

� 2 categories for each variable, so the

table is called a 2 x 2 table

� Suppose we examine a sample of

300 college students

Contingency Table Example

Sample results organized in a contingency table:

Hand

Preference

Gender

Female Male

Left 12 24 36

Right 108 156 264

120 180 300

120 Females, 12 were

left handed

180 Males, 24 were

left handed

sample size = n = 300:

Contingency Table Example

� If H0 is true, then the proportion of left-handed females

should be the same as the proportion of left-handed males.

� The two proportions above should be the same as the

proportion of left-handed people overall.

H0: π1 = π2 (Proportion of females who are left

handed is equal to the proportion of

males who are left handed)

H1: π1 ≠ π2 (The two proportions are not the same –

Hand preference is not independent

of gender)

The Chi-Square Test Statistic

where:

fo = observed frequency in a particular cell

fe = expected frequency in a particular cell if H0 is true

χ2 for the 2 x 2 case has 1 degree of freedom

∑−

=cells all e

2

eo2

f

)f(fχ

The Chi-square test statistic is:

Assumed: each cell in the contingency table has expected frequency of at least 5

Apri
Highlight
Apri
Highlight

The Chi-Square Test Statistic

Decision Rule:

If χ2 > χ2U, reject H0,

otherwise, do not reject

H0

The χ2 test statistic approximately follows a chi-square

distribution with one degree of freedom

χχχχ2222

χ2U

0

α

Reject H0Do not reject H0

Observed vs. Expected Frequencies

Hand

Preference

Gender

Female Male

LeftObserved = 12

Expected = 14.4

Observed = 24

Expected = 21.636

RightObserved = 108

Expected = 105.6

Observed = 156

Expected = 158.4264

120 180 300

The Chi-Square Test Statistic

Hand

Preference

Gender

Female Male

LeftObserved = 12

Expected = 14.4

Observed = 24

Expected = 21.636

RightObserved = 108

Expected = 105.6

Observed = 156

Expected = 158.4264

120 180 300

7576.04.158

)4.158156(

6.21

)6.2124(

6.105

)6.105108(

4.14

)4.1412(

)(

2222

cells all

22

=−

+−

+−

+−

=

−= ∑

e

eo

f

ffχ

The test statistic is:

The Chi-Square Test Statistic

Decision Rule:

If χ2 > 3.841, reject H0, otherwise, do

not reject H0

3.841 d.f. 1 with , 7576.0 is statistic test The 22== Uχχ

Here,

χ2 = 0..7576 < χ2U = 3.841,

so you do not reject H0 and

conclude that there is

insufficient evidence that the

two proportions are different.

χχχχ2222

χ2U=3.841

0

α=.05

Reject H0Do not reject H0

χ2 Test for The Differences Among More Than Two Proportions

� Extend the χ2 test to the case with more than two

independent populations:

H0: π1 = π2 = … = πc

H1: Not all of the πj are equal (j = 1, 2, …, c)

The Chi-Square Test Statistic

where:

� fo = observed frequency in a particular cell of the 2 x c table

� fe = expected frequency in a particular cell if H0 is true

� χ2 for the 2 x c case has (2-1)(c-1) = c - 1 degrees of freedom

Assumed: each cell in the contingency table has expected frequency of

at least 5

∑−

=cells all

22 )(

e

eo

f

ffχ

The Chi-square test statistic is:

Apri
Sticky Note
Fight between the populations over 2 variables. This implies that 2 rows and c columns

χ2 Test with More Than Two Proportions: Example

The sharing of patient records is a controversial issue in

health care. A survey of 500 respondents asked

whether they objected to their records being shared by

insurance companies, by pharmacies, and by medical

researchers. The results are summarized on the

following table:

χ2 Test with More Than Two Proportions: Example

Object to

Record

Sharing

Organization

Insurance

Companies

Pharmacies Medical

Researchers

Yes 410 295 335

No 90 205 165

χ2 Test with More Than Two Proportions: Example

6933.0500500500

335295410

...

...

21

21 =++

++=

+++

+++=

c

c

nnn

XXXp

The overall

proportion is:

Object to

Record

Sharing

Organization

Insurance

Companies

Pharmacies Medical

Researchers

Yes fo = 410

fe = 346.667

fo = 295

fe = 346.667

fo = 335

fe = 346.667

No fo = 90

fe = 153.333

fo = 205

fe = 153.333

fo = 165

fe = 153.333

χ2 Test with More Than Two Proportions: Example

Object

to

Record

Sharing

Organization

Insurance

Companies

Pharmacies Medical

Researchers

Yes

No

( )700.7

2

=−

e

eo

f

ff

( )159.26

2

=−

e

eo

f

ff

( )571.11

2

=−

e

eo

f

ff ( )3926.0

2

=−

e

eo

f

ff

( )409.17

2

=−

e

eo

f

ff ( )888.0

2

=−

e

eo

f

ff

1196.64)(

cells all

22

=−

= ∑e

eo

f

ffχThe Chi-square test statistic is:

χ2 Test with More Than Two Proportions: Example

Decision Rule:

If χ2 > χ2U, reject H0,

otherwise, do not reject H0

χ2U = 5.991 is from the chi-

square distribution with 2

degrees of freedom.

H0: π1 = π2 = π3

H1: Not all of the πj are equal (j = 1, 2, 3)

Conclusion: Since 64.1196 > 5.991, you reject H0 and you

conclude that at least one proportion of respondents who object to

their records being shared is different across the three

organizations