28
1 Chapter 17 Statistical Inference For Frequency Data I Three Applications of Pearson’s 2 Testing goodness of fit Testing independence Testing equality of proportions

1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s 2 Testing goodness of fit Testing independence Testing equality

Embed Size (px)

Citation preview

Page 1: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

1

Chapter 17

Statistical Inference For Frequency Data

I Three Applications of Pearson’s 2

Testing goodness of fit

Testing independence

Testing equality of proportions

Page 2: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

2

A. Testing Goodness of Fit

1. Statistical hypotheses

H0: OPop 1 = EPop 1, . . . , OPop k = EPop k

H1: OPop j ≠ EPop j for some j and j

2. Randomization Plan

One random sample of n elements

Each element is classified in terms of

membership in one of k mutually exclusive

categories

Page 3: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

3

B. Testing Independence

1. Statistical hypotheses

H0: p(A and B) = p(A)p(B)

H1: p(A and B) ≠ p(A)p(B)

2. Randomization Plan

One random sample of n elements

Each element is classified in terms of

two variables, denoted by A and B, where

each variable has two or more categories.

Page 4: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

4

C. Testing Equality of Proportions

1. Statistical hypotheses

H0: p1 = p2 = . . . = pc

H1: pj ≠ pj for some j and j

2. Randomization Plan

c random samples, where c ≥ 2

For each sample, elements are classified in

terms of membership in one of r = 2 mutually

exclusive categories

Page 5: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

5

II Testing Goodness of Fit

A. Chi-Square Distribution

f ( 2)

2

df = 1

df = 2

df = 6 df = 10

Page 6: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

6

B. Pearson’s chi-square statistic

1. Oj and Ej denote, respectively, observed and

expected frequencies. k denotes the number of

categories.

2. Critical value of chi square is with = k – 1

degrees of freedom.

2 (O j E j )2

E jj1

k

, 2

Page 7: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

7

C. Grade-Distribution Example

1. Is the distribution of grades for summer-school

students in a statistics class different from that for

the fall and spring semesters?

Fall and Spring Summer Grade Proportion Obs. frequency

A .12 15 B .23 21 C .47 30 D .13 6 F .05 0

1.00 24

Page 8: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

8

2. The statistical hypotheses are

H0: OPop 1 = EPop 1, . . . , OPop 5 = EPop 5

H1: OPop j ≠ EPop j for some j and j

3. Pearson’s chi-square statistic is

4. Critical value of chi square for = .05, k = 5

categories, and = 5 – 1 = 4 degrees of freedom

is

2 (O j E j )

2

E jj1

k

.05, 42 9.488.

Page 9: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

9

Table 1. Computation of Pearson’s Chi-Square for n = 72 Summer-School Students

(1) (2) (3) (4) (5) (6)

Grade Oj pj npj = Ej Oj – Ej

(O j E j )2

E j

A 15 .12 72(.12) = 8.6 6.4 4.763B 21 .23 72(.23) =16.6 4.4 1.166C 30 .47 72(.47) = 33.8 –3.8 0.427D 6 .13 72(.13) = 9.4 –-3.4 1.230F 0 .05 72(.05) = 3.6 –3.6 3.600

72 1.00 72.0 2 = 11.186*

*p < .025

Page 10: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

10

5. Degrees of freedom when e parameters of a

theoretical distribution must be estimated is

k – 1 – e.

D. Practical Significance

1. Cohen’s w

where and denote, the observed and

expected proportions in the jth category.

w ( p j p j )

2

p jj1

k

jp jp

Page 11: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

11

2. Simpler equivalent formula for Cohen’s

w 2

n

11.186

720.046

3. Cohen’s guidelines for interpreting w

0.1 is a small effect

0.3 is a medium effect

0.5 is a large effect

w

Page 12: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

12

E. Yates’ Correction

1. When = 1, Yates’ correction can be applied to

make the sampling distribution of the test statistic

for Oj – Ej , which is discrete, better approximate

the chi-square distribution.

2 (| O j E j | 0.5)2

E jj1

k

Page 13: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

13

F. Assumptions of the Goodness-of-Fit Test

1. Every observation is assigned to one and only

one category.

2. The observations are independent

3. If = 1, every expected frequency should be at

least 10. If > 1, every expected frequency should

be at least 5.

Page 14: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

14

III Testing Independence

A. Statistical Hypotheses

H0: p(A and B) = p(A)p(B)

H1: p(A and B) ≠ p(A)p(B)

B. Chi-Square Statistic for an r c Contingency Table with i = 1, . . . , r Rows and j = 1, . . . , c Columns

2 (Oij Eij )

2

Eijj1

c

i1

r

Page 15: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

15

C. Computational Example: Is Success on an Employment-Test Item Independent of Gender?

Observed Expected

b1 b2 b1 b2

Fail Pass Fail Pass

a1 Man 84 18 102 88.9 13.1a2 Women 93 8 101 88.1 12.9

177 26 203

2 (Oij Eij )

2

Eijj1

c

i1

r 4.299 * .05, 1

2 3.841

Page 16: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

16

D. Computation of expected frequencies

1. A and B are statistically independent if

p(ai and bj) = p(ai)p(bj)

2. Expected frequency, for the cell in

row i and column j

Eai and bj

np(ai ) p(bj )

(nai

nbj) / n

Eai and bj

,

n(nai

/ n)(nbj/ n)

Page 17: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

17

Ea2 and b1

(na2nb1

) / n (101)(177) / 203 88.1

Ea1 and b1

(na1nb1

) / n (102)(177) / 203 88.9

Ea2 and b2

(na2nb2

) / n (101)(26) / 203 12.9

Ea1 and b2

(na1nb2

) / n (102)(26) / 203 13.1

Observed Expectedb1 b2 b1 b2

a1 84 18 102 88.9 13.1

a2 93 8 101 88.1 12.9

177 26 203

Page 18: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

18

E. Degrees of Freedom for an r c Contingency Table

df = k – 1 – e

= rc – 1 – [(r – 1) + (c – 1)]

= rc – 1 – r + 1 – c + 1

= rc – r – c + 1

= (r – 1)(c – 1)

= (2 – 1)(2 – 1) = 1

Page 19: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

19

F. Strength of Association and Practical Significance

V observed

maximum

2 / n

s 1

2

n(s 1)

where s is the smaller of the number of rows and

columns.

V 2

n(s 1)

4.299

203(2 1)0.146

1. Cramér’s V

Page 20: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

20

w ( pij pij )

2

pijj1

c

i1

r

2

n0.146

3. For a contingency table, an alternative formula for

is

w V s 1 0.146 2 1 0.146

2. Practical significance, Cohen’s ŵ

w

Page 21: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

21

G. Three-By-Three Contingency Table

1. Motivation and education of conscientious

objectors during WWII

High GradeCollege School School Total

Coward 12 25 35 72Partly Coward 19 23 30 72Not Coward 71 56 24 151

Total 102 104 89 295

Page 22: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

22

2 (Oij Eij )

2

Eijj1

c

i1

r 36.681* .05, 4

2 9.488

(r 1)(c 1) (3 1)(3 1) 4

2. Strength of Association, Cramér’s

3. Practical significance

w V s 1 0.249 3 1 0.352

V 2

n(s 1)

36.681

295(3 1)0.249

V

Page 23: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

23

H. Assumptions of the Independence Test

1. Every observation is assigned to one and only

one cell of the contingency table.

2. The observations are independent

3. If = 1, every expected frequency should be at

least 10. If > 1, every expected frequency should

be at least 5.

Page 24: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

24

IV Testing Equality of c ≥ 2 Proportions

A. Statistical Hypotheses

H0: p1 = p2 = . . . = pc

H1: pj ≠ pj for some j and j

1. Computational example: three samples of n = 100

residents of nursing homes were surveyed.

Variable A was age heterogeneity in the home;

variable B was resident satisfaction.

Page 25: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

25

Table 2. Nursing Home Data

Age Heterogeneity

Low b1 Medium b2 High b3

Satisfied a1 O = 56 O = 58 O = 38

E = 50.67 E = 50.67 E = 50.67

Not Satisfied a2 O = 44 O = 42 O = 52

E = 49.33 E = 49.33 E = 49.33

Page 26: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

26

2 (Oij Eij )

2

Eijj1

c

i1

r 9.708*

.05, 22 5.991

(r 1)(c 1) (2 1)(3 1) 2

B. Assumptions of the Equality of ProportionsTest

1. Every observation is assigned to one and only

one cell of the contingency table.

Page 27: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

27

2. The observations are independent

3. If = 1, every expected frequency should be at

least 10. If > 1, every expected frequency should

be at least 5.

C. Test of Homogeneity of Proportions

1. Extension of the test of equality of

proportions when variable A has r > 2 rows

Page 28: 1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2 Testing goodness of fit Testing independence Testing equality

28

2. Statistical hypotheses

for columns j and j'

H1 : pai |b jpai |b j

in at least one row

crrr

c

c

bababa

bababa

bababa

o

PPP

PPP

PPP

H

|||

|||

|||

21

22212

12111

: