1 Math 4030 – 10b Inferences Concerning Proportions

1

Math 4030 – 10b

Inferences ConcerningProportions

2

Population proportion p is:

• p(100)% of the subjects in the population has the property of our interest;

• if randomly select one subject from the population, the probability is p that the subject has the property of our interest;

• if we take a sample of size n, of which X subjects have the property of our interest, then the sample proportion is

Sample Proportion:

nX

3

Distribution of sample proportion X/n:

),(~ pnBinomialX))1(,( pnpnpN

npppN

nX )1(,

For n ≥ 30

Confidence Interval for p (Sec. 10.1):

nnx

nx

znxp

nnx

nx

znx

11

2/2/

Maximum error of estimate for p

4

Sample size calculation:

/2

1p pE z

n

2/2(1 ) zn p pE

p??

• Use p from similar population;• Use ¼ as maximum of p(1-p);• If = 0.05, we may use n = 1/E2

5

For Hypothesis Testing (Sec. 10.2)

npppN

nX )1(,

)1,0(~

11 00

0

00

0N

pnpnpX

npp

pnX

Z

00 : ppH

6

A new method is under development for making disks of a superconducting material. 50 disks are made by each method (new and old) and they are checked for superconductivity when cooled with liquid nitrogen.

Compare 2 proportions:

Old Method 1 New Method 2 TotalSuperconductor

s 31 42 73

Failures 19 8 27

Total 50 50 100

Need to claim that the new method makes improvement.

7

2112

111,0~ˆˆnn

ppNpp

use unknown, is If p

)1,0(~

111

ˆˆ

21

12 N

nnpp

pp

or

Sample proportions: .ˆ ,ˆ2

22

1

11 n

XpnXp

Distribution under the assumption pppH 210 :

.ˆ21

21

nnxxp

Distribution of Sample Proportion Difference:

8

478.2

501

50173.0173.0

62.084.0

11ˆ1ˆ

ˆˆ

OLDNEW

OLDNEW

nnpp

ppZ

0:0:

1

0

OLDNEW

OLDNEW

ppHppH

Hypothesis Testing:Null hypothesis

Alternative hypothesis

Level of significance: Test) tailed-(Right 05.0

Critical value and Critical region: for large sample, we use the z-test 645.105.0 z

Sample statistic calculation:

Conclusion: Reject the null hypothesis, …

73.010073ˆ p

9

17.050

62.0162.050

84.0184.096.1

ˆ1ˆˆ1ˆ2/

OLD

OLDOLD

NEW

NEWNEW

npp

nppzE

Confidence interval for the difference:

22.062.084.05031

5042ˆˆ OLDNEW pp

39.0ˆˆ05.0 OLDNEW pp

More than Up to

10

Compare Several Proportions (Sec. 10.3):

Sample 1

Sample 2 … Sampl

e k TotalSucces

ses x1 x2 … xk xFailure

s n1-x1 n2-x2 … nk-xk n - x

Total n1 n2 … nk n

From k independent samples from k populations, we have

11

,)1( jjj

jjjj ppn

pnXZ

for each j, and large sample.

Sampling distribution if are k population proportions:

kppp ,...,, 21

Combined

k

j jjj

jjj

ppnpnX

1

22

)1(

has chi-square distribution with df = k – 1.

Normal approximate binomial.

12

)1()1()1(

222

jjj

jjjj

jjj

jjjj

jjj

jjj

ppnpnXp

ppnpnXp

ppnpnX

jj

jjj

pnpnX 2

jj

jjjj

jj

jjjjj

pnpnXn

pnpnnnX

11

12

2

k

j i ij

ijijk

j jjj

jjj

eeo

ppnpnX

1

2

1

2

1

22

)1(

Observed frequency

Expected frequency

13

,

2

1 1

22

i

k

j ij

ijij

eeo

same. theallnot are ,...,,:...:

4211

210

pppHpppH k

Hypothesis Testing:

Null hypothesis

Alternative hypothesis

Sample statistic:

wherek

k

nnnxxx

nxp

......ˆ

21

21 (Pooled proportion)

pne

pne

jj

jj

ˆ1

ˆ

2

1

(Expected Cell Frequency)

jjj

jj

xno

xo

2

1

(Observed Cell Frequency)

with df = k – 1,

14

Example. Four methods are under development for making disks of a superconducting material. 40, 50, 60, 70 disks are made by each of 4 methods, respectively, and they are checked for superconductivity when cooled with liquid nitrogen.

Method 1

Method 2

Method 3

Method 4 Total

Superconduct

ors21 32 32 45 130

Failures 9 8 28 25 70

Total 30 40 60 70 200

15

First we need to know whether 4 methods have any difference.

Null hypothesis: Alternative hypothesis: are not all equal.Level of significance: = 0.05Critical region: With df = 4 – 1 = 3, we have

Critical region is: (7.815, ).Statistic from sample: We need to calculate the expected frequencies.

4321 pppp

4321 ,,, pppp

815.7)3(205.0

16

Method 1

Method 2

Method 3

Method 4 Total

Superconduct

ors21

(19.5)32

(26)32

(39)45

(45.5) 130

Failures

9(10.5)

8(14)

28(21)

25(24.5) 70

Total 30 40 60 70 2002 = 7.891

Expected frequencies:

Conclusion: Since the sample statistic falls in the critical region, we reject the null hypothesis. Four methods are not all the same.

How do these methods differ?

17

j

j

j

j

j

j

jj

j

j

j

j

j

j

j

nnx

nx

znx

pnnx

nx

znx

11

2/2/

Gives confidence interval for each of the 4 population (method) proportion.Use Excel, we find

M1 M2 M3 M4Sample Size 30 40 60 70

Sample Proportion 0.70 0.80 0.53 0.64E 0.16 0.12 0.13 0.11

95% CI - L 0.54 0.68 0.41 0.53

95% CI - R 0.86 0.92 0.66 0.76

18

Method 1

Method 2

Method 3

Method 4

0.4 0.90.80.70.60.5p

M1 M2 M3 M4Sample Size 30 40 60 70

Sample Proportion 0.70 0.80 0.53 0.64E 0.16 0.12 0.13 0.11

95% CI - L 0.54 0.68 0.41 0.53

95% CI - R 0.86 0.92 0.66 0.76

Documents

1 Math 4030 – 10b Inferences Concerning Proportions