Comparing Two Proportions - University of Washington data handout.pdf · SPSS output for Chi-square test GENDER * periodontal status Crosstabulation 1143 929 937 3009 1405.7 906.8

Comparing Two Proportions

Example: caries incidence

Clinical trial with caries intervention on infants

N

developed caries

by age two

controls 36 27.8%

intervention 68 8.8%

Is this strong evidence of effectiveness of

experimental intervention?

Comparison of two proportions

- two independent samples

These are called “two-sample” tests.

Our goal is usually to estimate p1 – p2, the

corresponding confidence intervals, and to perform

hypothesis tests on:

H0: p1 – p2 = 0.

The obvious statistic to compare the two population

proportions is 1p̂ - 2p̂ . Where ip̂ = number of

successes in group i divided by sample size in group

i.

Probability theory tells us that:

1. 1p̂ - 2p̂ is the best estimate of p1 – p2

2. the standard error is 222111 )1()1( nppnpp

3. If n1p1(1-p1) > 5 and n2p2(1-p2) > 5

2221112121 )1()1(,~ˆˆ nppnppppNpp

Large-sample confidence interval for p1 – p2

2221112/121 )ˆ1(ˆ)ˆ1(ˆˆˆ nppnppZpp

Large-sample Z-test of

H0: p1 – p2 = 0 vs. H1: p1 – p2 ≠ 0

Test statistic: )ˆˆ(

ˆˆ

21

21

0ppSE

ppZ

H

Where )ˆˆ( 210ppSEH denotes the standard error

estimates using H0: p1 - p2 = 0 (p1 = p2)

Estimate the common p using

21

21ˆnn

xxp

,

where x1 and x2 are the number of successes in

groups 1 and 2, respectively.

Then

2121 11)ˆ1(ˆ)ˆˆ(0

nnppppSEH

Compare Z to a standard Normal distribution.

Example: Caries incidence

N

caries by age two

Number percent

controls 36 10 27.8%

intervention 68 6 8.8%

95% confidence interval:

1p̂ - 2p̂ = 0.278 - 0.088 = 0.19

082.068)088.01(088.036)278.01(278.0 SE

95% confidence interval for p1 - p2:

351.0,029.0082.096.119.0

Test: H0: p1 – p2 = 0 vs. H1: p1 – p2 ≠ 0

154.06836610ˆ p

074.0681361)154.01(154.0)ˆˆ( 210 ppSEH

57.2074.0

19.0Z

P-value = 2×P(Z > 2.57) = 0.010. Reject at α=.05 level.

Chi-squared Test ( χ2 test)

Chi-square test generalizes two-sample Z-test to

situation with more than two proportions.

Example: perio by gender (NHANES I data):

Evaluate whether periodontitis is independent of

gender by seeing if the proportion of males in each

group defined by periodontal status is the same.

χ2 test utilizes “contingency” tables

The null hypothesis is that all proportions are equal

H0: p1 = p2 = p3.

Observed Data

Co unt

11 43 92 9 93 7 30 09

26 07 14 90 92 1 50 18

37 50 24 19 18 58 80 27

male

fem ale

GENDER

To tal

healthy gin givitis per io

per iodon tal status

To tal

Expected frequencies (under assumption of equal proportions)

periodontal status

Total healthy gingivitis perio

male 3750 ×

(3009/8027)

= 1406

2419 ×

(3009/8027)

= 907

1858 ×

(3009/8027)

= 697

3009

female

3750 ×

(5018/8027)

= 2344

2419 ×

(5018/8027)

= 1512

1858 ×

(5018/8027)

= 1161

5018

Total 3750 2419 1858 8027

Chi-squared statistic:

X2 = Σ(observed - expected)2

expected

697

)697937(

907

)907929(

1406

)14061143( 222

1161

)1161921(

1512

)15121490(

2344

)23442607( 222

= 212

Large (positive) values of X2 indicate evidence

against the null hypothesis.

If H0 is true, then a χ2 statistic from a contingency

table with R rows and C columns should have a

Chi-square distribution with (R-1) × (C-1)

degrees of freedom.

The P-value is the probability that a χ2

(R-1) × (C-1)

distribution is greater than the observed statistic.

Note that all the probability in the p-value (and

rejection region) is on one side, since only large

values of X2 would contradict H0.

Our statistic, 212, was larger than 15.20, the

99.95th percentile of a χ22 dist’n, so p < 0.0005.

Table 6 in the coursepack has χ2 percentiles.

SPSS output for Chi-square test

GENDER * periodontal status Crosstabulation

11 43 92 9 93 7 30 09

14 05.7 90 6.8 69 6.5 30 09.0

26 07 14 90 92 1 50 18

23 44.3 15 12.2 11 61.5 50 18.0

37 50 24 19 18 58 80 27

37 50.0 24 19.0 18 58.0 80 27.0

Co unt

Ex pected Count

Co unt

Ex pected Count

Co unt

Ex pected Count

male

fem ale

GENDER

To tal

healthy gin givitis per io

per iodon tal status

To tal

Chi-Square Tests

21 2.271 a 2 .00 0

21 0.264 2 .00 0

20 9.324 1 .00 0

80 27

Pearson Chi-Square

Lik eliho od Ratio

Lin ear-by-Linear Associatio n

N o f Valid Cases

Value df

Asymp. Sig.

(2- sided)

0 cells (. 0%) have expected count less than 5. The minimum

expected count is 696.49.

a.

Notes on Chi-squared test:

1. Chi-square test p-values rely on Normal

approximations, so they not valid for small

samples (any expected frequencies < 5).

2. Reject H0 at significance level α if the Chi-

square statistic is greater than the 100(1- α)th

percentile of the Chi-square distribution (i.e. not

α/2).

3. The null hypothesis for the Chi-square test can

be equivalently formulated as “X1 is

independent of X2”, where X1 and X2 are the

two categorical variables being compared

(gender and perio status in our example).

4. When comparing two proportions the Chi-

square test is equivalent to Z-test for two

proportions.

5. The Z-test for two proportions can be

formulated as a one-sided test, but the Chi-

square test cannot.

Does Normality assumption hold?

.

Fisher’s Exact Test

Does not rely on Normality assumption.

Uses “exact” distribution instead of a Normal

approximation.

Use in place of χ2 test when any expected cell

frequency is less than 5.

Example: caries incidence

Caries

yes no total

control 10 26 36

intervention 6 62 68

total 16 88 104

The null-hypothesis of Fisher’s Exact test is that

there is no relationship between the two

characteristics. Thus, every possible arrangement of

observations in the respective cells is equally likely

(but assuming row and column totals don’t change).

The p-value is computed by calculating the number

of possible arrangements of observations that

produce tables that are more extreme than the

observed and then dividing this by the total number

of possible arrangements of the observations.

Example: Caries incidence

observed table Caries yes no total �̂�𝑐 − �̂�𝑖 = 0.19 control 10 26 36

intervention 6 62 68

probability of table

under H0

total 16 88 104 𝑃𝐻0= 0.01048

Tables more extreme (result in greater difference in proportions)

11 25 36 �̂�𝑐 − �̂�𝑖 = 0.23 15 21 36 �̂�𝑐 − �̂�𝑖 = 0.40

5 63 68 1 67 68

16 88 104 𝑃𝐻0= 0.00236 16 88 104 𝑃𝐻0= 0.00000

12 24 36 �̂�𝑐 − �̂�𝑖 = 0.27 16 20 36 �̂�𝑐 − �̂�𝑖 = 0.44

4 64 68 0 68 68

16 88 104 𝑃𝐻0= 0.00038 16 88 104 𝑃𝐻0= 0.00000

13 23 36 �̂�𝑐 − �̂�𝑖 = 0.32 0 36 36 �̂�𝑐 − �̂�𝑖 = -0.24

3 65 68 16 52 68

16 88 104 𝑃𝐻0= 0.00004 16 88 104 𝑃𝐻0= 0.00055

14 22 36 �̂�𝑐 − �̂�𝑖 = 0.36 1 35 36 �̂�𝑐 − �̂�𝑖 = -0.19

2 66 68 15 53 68

16 88 104 𝑃𝐻0= 0.00000 16 88 104 𝑃𝐻0= 0.00602

Total probability of all as or more extreme tables = 0.01985

Formula for Probability of Table in Fisher’s Exact Test

Table

a b

c d

Probability

d!c!b!a!n!

d)!(bc)!(ad)!(cb)!(a

SPSS output

treatment group * caries a t age two Crosstabulation

Count

26 10 36

62 6 68

88 16 104

controls

intervention

treatment

group

Total

no yes

caries at age two

Total

Chi-Square Tests

6.496b 1 .011

5.122 1 .024

6.171 1 .013

.020 .013

6.434 1 .011

.000c

104

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-Linear

Association

McNemar Test

N of Valid Cases

Value df

Asymp. Sig.

(2-sided)

Exact Sig.

(2-sided)

Exact Sig.

(1-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. The minimum expected count is

5.54.

b.

Binomial distribution used.c.

McNemar’s Test for Proportions (Paired Data)

Use for comparing proportions from paired data

Example: Change in plaque index

Fifty-three study participants assessed twice for plaque

index (PI), at baseline and 4 weeks later. We wish to assess

whether the proportion of patients with high PI changes.

PI at 4 weeks

PI at baseline low high

low 29 1

high 13 10

Incorrect methods:

1. Comparing 53

23ˆ

1 p with 53

11ˆ

2 p using the Z-test:

This will not give a valid p-value because it does not

compare independent samples. The same 53 people

are used in each proportion.

2. Performing a Chi-square or Fisher’s Exact test on the

above 2×2 table: These would test whether the

proportion of high’s at baseline is related to the

proportion of high’s at 4 weeks. They would not test

whether or not the proportions are different.

PI at 4 weeks


low 29 1

high 13 10

McNemar’s Test assesses the null hypothesis

H0: P(PI high at baseline) = P(PI high at 4 week),

by noting that it is equivalent to:

H0: P(PI changes high to low) = P(PI changes low to high),

for all discordant pairs.

The discordant pairs are those that have different

values for the two observations. Note that each entry

in the table is the number of pairs.

The latter H0 can be evaluated using a one-sample

test for proportions with,

H0: p = 0.50, vs. H1: p ≠ 0.50,

where p = proportion of discordant pairs that increase.

PI at 4 weeks


low 29 1

high 13 10

If n > 20 (where n is # of discordant pairs) can use

Z-test for proportions (chapter 9.3).

If n < 20 (as in the current example, n = 14) use the

binomial distribution to compute the exact p-value.

Let X = number of discordant pairs that increase,

which, under H0, is binomial(n = 14, p = 0.5).

The two-sided p-value is the probability that we

would see a more unbalanced sample of the

discordant pairs than 13 vs 1, which is

P(X < 1) + P(X > 13)

= P(X=0) + P(X=1) + P(X=13) + P(X=14)

= 0.0001 + 0.0009 + 0.0009 + 0.0001

= 0.0020

SPSS output

baseline PI * four week PI Crosstabulation

Count

29 1 30

13 10 23

42 11 53

low

high

baseline

PI

Total

low high

four week PI

Total

Chi-Square Tests

12.757b 1 .000

10.433 1 .001

13.872 1 .000

.000 .000

12.516 1 .000

.002c

53

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-Linear

Association

McNemar Test

N of Valid Cases

Value df

Asymp. Sig.

(2-sided)

Exact Sig.

(2-sided)

Exact Sig.

(1-sided)

Computed only for a 2x2 tablea.

1 cells (25.0%) have expected count less than 5. The minimum expected count is

4.77.

b.

Binomial distribution used.c.

Analysis of Categorical Data Summary

Proportions from two independent samples

Large samples – Z-test for proportions

Small samples – Fisher’s Exact Test

Proportions from > 2 independent samples

Chi-square test

Proportions from paired data

McNemar’s Test

Documents

Comparing Two Proportions - University of Washington data handout.pdf · SPSS output for Chi-square test GENDER * periodontal status Crosstabulation 1143 929 937 3009 1405.7 906.8