1 Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s 2 Testing...

Preview:

Citation preview

1

Chapter 17

Statistical Inference For Frequency Data

I Three Applications of Pearson’s 2

Testing goodness of fit

Testing independence

Testing equality of proportions

2

A. Testing Goodness of Fit

1. Statistical hypotheses

H0: OPop 1 = EPop 1, . . . , OPop k = EPop k

H1: OPop j ≠ EPop j for some j and j

2. Randomization Plan

One random sample of n elements

Each element is classified in terms of

membership in one of k mutually exclusive

categories

3

B. Testing Independence

1. Statistical hypotheses

H0: p(A and B) = p(A)p(B)

H1: p(A and B) ≠ p(A)p(B)

2. Randomization Plan

One random sample of n elements

Each element is classified in terms of

two variables, denoted by A and B, where

each variable has two or more categories.

4

C. Testing Equality of Proportions

1. Statistical hypotheses

H0: p1 = p2 = . . . = pc

H1: pj ≠ pj for some j and j

2. Randomization Plan

c random samples, where c ≥ 2

For each sample, elements are classified in

terms of membership in one of r = 2 mutually

exclusive categories

5

II Testing Goodness of Fit

A. Chi-Square Distribution

f ( 2)

2

df = 1

df = 2

df = 6 df = 10

6

B. Pearson’s chi-square statistic

1. Oj and Ej denote, respectively, observed and

expected frequencies. k denotes the number of

categories.

2. Critical value of chi square is with = k – 1

degrees of freedom.

2 (O j E j )2

E jj1

k

, 2

7

C. Grade-Distribution Example

1. Is the distribution of grades for summer-school

students in a statistics class different from that for

the fall and spring semesters?

Fall and Spring Summer Grade Proportion Obs. frequency

A .12 15 B .23 21 C .47 30 D .13 6 F .05 0

1.00 24

8

2. The statistical hypotheses are

H0: OPop 1 = EPop 1, . . . , OPop 5 = EPop 5

H1: OPop j ≠ EPop j for some j and j

3. Pearson’s chi-square statistic is

4. Critical value of chi square for = .05, k = 5

categories, and = 5 – 1 = 4 degrees of freedom

is

2 (O j E j )

2

E jj1

k

.05, 42 9.488.

9

Table 1. Computation of Pearson’s Chi-Square for n = 72 Summer-School Students

(1) (2) (3) (4) (5) (6)

Grade Oj pj npj = Ej Oj – Ej

(O j E j )2

E j

A 15 .12 72(.12) = 8.6 6.4 4.763B 21 .23 72(.23) =16.6 4.4 1.166C 30 .47 72(.47) = 33.8 –3.8 0.427D 6 .13 72(.13) = 9.4 –-3.4 1.230F 0 .05 72(.05) = 3.6 –3.6 3.600

72 1.00 72.0 2 = 11.186*

*p < .025

10

5. Degrees of freedom when e parameters of a

theoretical distribution must be estimated is

k – 1 – e.

D. Practical Significance

1. Cohen’s w

where and denote, the observed and

expected proportions in the jth category.

w ( p j p j )

2

p jj1

k

jp jp

11

2. Simpler equivalent formula for Cohen’s

w 2

n

11.186

720.046

3. Cohen’s guidelines for interpreting w

0.1 is a small effect

0.3 is a medium effect

0.5 is a large effect

w

12

E. Yates’ Correction

1. When = 1, Yates’ correction can be applied to

make the sampling distribution of the test statistic

for Oj – Ej , which is discrete, better approximate

the chi-square distribution.

2 (| O j E j | 0.5)2

E jj1

k

13

F. Assumptions of the Goodness-of-Fit Test

1. Every observation is assigned to one and only

one category.

2. The observations are independent

3. If = 1, every expected frequency should be at

least 10. If > 1, every expected frequency should

be at least 5.

14

III Testing Independence

A. Statistical Hypotheses

H0: p(A and B) = p(A)p(B)

H1: p(A and B) ≠ p(A)p(B)

B. Chi-Square Statistic for an r c Contingency Table with i = 1, . . . , r Rows and j = 1, . . . , c Columns

2 (Oij Eij )

2

Eijj1

c

i1

r

15

C. Computational Example: Is Success on an Employment-Test Item Independent of Gender?

Observed Expected

b1 b2 b1 b2

Fail Pass Fail Pass

a1 Man 84 18 102 88.9 13.1a2 Women 93 8 101 88.1 12.9

177 26 203

2 (Oij Eij )

2

Eijj1

c

i1

r 4.299 * .05, 1

2 3.841

16

D. Computation of expected frequencies

1. A and B are statistically independent if

p(ai and bj) = p(ai)p(bj)

2. Expected frequency, for the cell in

row i and column j

Eai and bj

np(ai ) p(bj )

(nai

nbj) / n

Eai and bj

,

n(nai

/ n)(nbj/ n)

17

Ea2 and b1

(na2nb1

) / n (101)(177) / 203 88.1

Ea1 and b1

(na1nb1

) / n (102)(177) / 203 88.9

Ea2 and b2

(na2nb2

) / n (101)(26) / 203 12.9

Ea1 and b2

(na1nb2

) / n (102)(26) / 203 13.1

Observed Expectedb1 b2 b1 b2

a1 84 18 102 88.9 13.1

a2 93 8 101 88.1 12.9

177 26 203

18

E. Degrees of Freedom for an r c Contingency Table

df = k – 1 – e

= rc – 1 – [(r – 1) + (c – 1)]

= rc – 1 – r + 1 – c + 1

= rc – r – c + 1

= (r – 1)(c – 1)

= (2 – 1)(2 – 1) = 1

19

F. Strength of Association and Practical Significance

V observed

maximum

2 / n

s 1

2

n(s 1)

where s is the smaller of the number of rows and

columns.

V 2

n(s 1)

4.299

203(2 1)0.146

1. Cramér’s V

20

w ( pij pij )

2

pijj1

c

i1

r

2

n0.146

3. For a contingency table, an alternative formula for

is

w V s 1 0.146 2 1 0.146

2. Practical significance, Cohen’s ŵ

w

21

G. Three-By-Three Contingency Table

1. Motivation and education of conscientious

objectors during WWII

High GradeCollege School School Total

Coward 12 25 35 72Partly Coward 19 23 30 72Not Coward 71 56 24 151

Total 102 104 89 295

22

2 (Oij Eij )

2

Eijj1

c

i1

r 36.681* .05, 4

2 9.488

(r 1)(c 1) (3 1)(3 1) 4

2. Strength of Association, Cramér’s

3. Practical significance

w V s 1 0.249 3 1 0.352

V 2

n(s 1)

36.681

295(3 1)0.249

V

23

H. Assumptions of the Independence Test

1. Every observation is assigned to one and only

one cell of the contingency table.

2. The observations are independent

3. If = 1, every expected frequency should be at

least 10. If > 1, every expected frequency should

be at least 5.

24

IV Testing Equality of c ≥ 2 Proportions

A. Statistical Hypotheses

H0: p1 = p2 = . . . = pc

H1: pj ≠ pj for some j and j

1. Computational example: three samples of n = 100

residents of nursing homes were surveyed.

Variable A was age heterogeneity in the home;

variable B was resident satisfaction.

25

Table 2. Nursing Home Data

Age Heterogeneity

Low b1 Medium b2 High b3

Satisfied a1 O = 56 O = 58 O = 38

E = 50.67 E = 50.67 E = 50.67

Not Satisfied a2 O = 44 O = 42 O = 52

E = 49.33 E = 49.33 E = 49.33

26

2 (Oij Eij )

2

Eijj1

c

i1

r 9.708*

.05, 22 5.991

(r 1)(c 1) (2 1)(3 1) 2

B. Assumptions of the Equality of ProportionsTest

1. Every observation is assigned to one and only

one cell of the contingency table.

27

2. The observations are independent

3. If = 1, every expected frequency should be at

least 10. If > 1, every expected frequency should

be at least 5.

C. Test of Homogeneity of Proportions

1. Extension of the test of equality of

proportions when variable A has r > 2 rows

28

2. Statistical hypotheses

for columns j and j'

H1 : pai |b jpai |b j

in at least one row

crrr

c

c

bababa

bababa

bababa

o

PPP

PPP

PPP

H

|||

|||

|||

21

22212

12111

:

Recommended