13
1 Confidence Interval for one population mean or one population proportion, continued 1. Sample size estimation based on the large sample C.I. for p From the interval 2 2 ˆ ˆ ˆ ˆ (1 ) (1 ) ˆ ˆ , p p p p p Z p Z n n 2 ˆ ˆ (1 ) lengh of your 100(1 )% CI 2 p p L Z n ˆ , , L p are given and we are interested in sample size n . Therefore, 2 2 2 2 2 2 2 2 2 1 1 4( ) 1 ˆ ˆ 4( ) (1 ) ( ) 2 2 Z Z p p Z n L L L (When 1 ˆ 2 p , it has the maximum value.) Example 1. ˆ 0.02, 0.05, 0.54 ? L p n Example 2. ˆ 0.02, 0.05, 0.5 ? L p n 2. Sample size calculation for p based on the maximum error E. Definition. ˆ ( ) 1 P p p E We want to estimate p within E with a probability of (1 ) . Derive the formula for n

Confidence Interval for one population mean or one population proportion, continued 1 ...zhu/ams570/Lecture16_570.pdf · 2016-03-31 · 3 When p is unknown, we can plug in the estimate

Embed Size (px)

Citation preview

1

Confidence Interval for one population mean or one

population proportion, continued

1. Sample size estimation based on the large sample

C.I. for p

From the interval 2 2

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ,

p p p pp Z p Z

n n

2

ˆ ˆ(1 )lengh of your 100(1 )% CI 2

p pL Z

n

ˆ, ,L p are given and we are interested in sample size n . Therefore,

22 22

2 2

2 2 2

1 14( ) 1

ˆ ˆ4( ) (1 ) ( )2 2Z

Z p p Zn

L L L

(When 1

ˆ2

p , it has the maximum value.)

Example 1. ˆ0.02, 0.05, 0.54 ?L p n

Example 2. ˆ0.02, 0.05, 0.5 ?L p n

2. Sample size calculation for p based on the

maximum error E.

Definition. ˆ( ) 1P p p E

We want to estimate p within E with a probability of (1 ) .

Derive the formula for n

2

(| | )

( )

(

√ ( )

√ ( )

√ ( ) )

Since when the sample size is large, by the CLT we have:

√ ( )

( )

Thus

(

√ ( )

√ ( ) )

√ ( )

( )

( )

3

When p is unknown, we can plug in the estimate of p, , and obtain the

following formula:

( )

( )

Recall we also derived n based on L - the length of the 100(1 ) %

large sample confidence interval for p . Their relationship (using the

second formula for sample size calculation because our CI formula was

basd on the second PQ where we estimated the p in the denominator) is

2L E

Recall we also derived n based on L - the length of the 100(1 ) %

large sample confidence interval for p . Their relationship is

2L E

ˆ( ) 1

ˆ( ) 1

ˆ ˆ( ) 1

ˆ ˆ( ) 1

P p p E

P E p p E

P E p p E p

P p E p p E

The 100(1 ) % confidence interval for p is ˆ ˆ,p E p E .

The length of the confidence interval is ˆ ˆ 2L p E p E E

Example 3. In order to estimate the percent of children with inadequate

immunization to be within 0.05 of the true proportion with a probability

of 98%

(a) How many children should be sampled?

Solution. 0.05, 1 0.98 0.02E

2

2

2

0.01

2

4 (0.05)

2.33 543

4 (0.05)

Zn

4

(b) If the percentage of children with inadequate immunization is

estimated to be 20% , then ?n

Solution. ˆ 20%p

2

2

0.01 0.2 1 0.2348

(0.05)

Zn

Sample size calculates for 1 population proportions based on the

maximum error E.

3. Confidence Interval for 1 population mean

There are 4 scenarios – we cover only the first 3 scenarios in our class.

1. Normal population, 2 is known.

a. Point Estimator : X

b. Pivotal Quantity : ~ (0,1)X

Z Nn

c. 100(1 )% CI for :

2 2 2( ) 1P Z Z Z X Zn

d. Length of CI : 22L Zn

e. Sample size based on L :

22

2

2

4 Zn

L

f. Sample size based on E

( ) 1P X E

( ) 1P E X E

( ) 1P X E X E

2L X E X E E

2

2

2

2

Zn

E

2. Normal population, 2 is unknown.

a. Point Estimator : X

5

b. Pivotal Quantity : 1~ n

XT t

S n

c. 100(1 )% CI for :

1, 2 1, 2 1, 2( ) 1n n n

SP t T t X t

n

d. Length of CI : 1, 22 n

SL t

n

e. Sample size based on L :

22

1, 2

2

4 nt Sn

L

f. Sample size based on E

( ) 1P X E

( ) 1P E X E

( ) 1P X E X E

2L X E X E E

2

2

1, 2

2

nt Sn

E

3. Any populations, large sample

a. Point Estimator : X

b. Pivotal Quantity :

~ (0,1) or ~ (0,1)X X

Z N Z Nn S n

c. 100(1 )% CI for : 2 2( ) 1P Z Z Z

2

2

X Zn

SX Z

n

d. Length of CI : 2 22 or 2S

L Z L Zn n

e. Sample size based on L :

2 2

2 2

2 2

2 2

4 4 or

Z Z Sn n

L L

f. Sample size based on E

6

( ) 1P X E

( ) 1P E X E

( ) 1P X E X E

2L X E X E E

2 2

2 2

2 2

2 2 or

Z Z Sn n

E E

4. There also exist other cases, but we don’t cover those in

our class.

Now I will present more details for Scenario 2

mentioned above.

Scenarios 1 & 3 (easy)

Scenario 2 : normal population, 2 unknown

1. Point estimation :

2

~ ( , )X Nn

2. ~ (0,1)X

Z Nn

3. Theorem. Sampling from normal population

a. ~ (0,1)Z N

b. 2

2

12

1~ n

n SW

c. Z and W are independent.

Definition. 1~( 1)

n

Z XT t

W n S n

------ Derivation of CI, normal population, 2 is unknown ------

2

~ ( , )X Nn

is not a pivotal quantity.

2

~ (0, )X Nn

is not a pivotal quantity.

7

~ (0,1)/

XZ N

n

is not a pivotal quantity.

Remove !!!

Therefore 1~/

n

XT t

S n

is a pivotal quantity.

Now we will use this pivotal quantity to derive the 100(1-α)%

confidence interval for μ.

We start by plotting the pdf of the t-distribution with n-1 degrees

of freedom as follows:

The above pdf plot corresponds to the following probability

statement:

1, /2 1, /2( ) 1n nP t T t

=> 1, /2 1, /2( ) 1

/n n

XP t t

S n

8

=> 1, /2 1, /2( ) 1n n

S SP t X t

n n

=> 1, /2 1, /2( ) 1n n

S SP X t X t

n n

=> 1, /2 1, /2( ) 1n n

S SP X t X t

n n

=> 1, /2 1, /2( ) 1n n

S SP X t X t

n n

=> Thus the 100(1 )% C.I. for when 2 is unknown is

1, /2 1, /2[ , ]n n

S SX t X t

n n .

(*Please note that 1, /2 /2nt Z )

Example 4. In a random sample of 36n parochial schools throughout

the south, the average number of pupils per school is 379.2 with a

standard deviation of 124. Use the sample to construct a 95% CI for ,

the mean number of pupils per school for all parochial schools in the

south.

Solution. CI for , large sample

36, 379.2, 124, =0.05n X S

95% CI for is 0.05

124379.2 1.96

36

SX Z

n

338.7,419.7

Example 5. In a psychological depth-perception test, a random sample of

14n airline pilots were asked to judge the distance between 2 markers

at the other end of a laboratory.

The data (in test) are

9

2.7, 2.4, 1.9, 2.4, 1.9, 2.3, 2.2, 2.5, 2.3, 1.8, 2.5, 2.0, 2.2, 2.6

Please construct a 95% CI for , the average distance.

Solution.

(Note: we can perform the Shapiro-Wilk test to examine whether the

sample comes from a normal population or not. This test is not required

in our class. Here we simply assume the population is normal. I will

always give you such information in the exams.)

CI for , small sample, normal population, population variance

unknown.

14, 2.26, 0.28, =0.05n X S

95% CI for is 1, 2

0.282.26 2.16

14n

SX t

n

2.10,2.42

Example 6. A federal agency has decided to investigate the advertised

weight we printed on cartons of a certain brand of cereal. Historical data

show that 0.75 ounce. If we wish to estimate the weight within 0.25

ounce with 99% confidence, how many cartons should be sampled?

Solution. 0.25, 0.75, 0.01E

2 2

0.005

2

0.7560

0.25

Zn

Example 7. (review of exact CI for mean when the

population is normal and the population variance is

known.) (PSU). A random sample of 126 police officers subjected to

constant inhalation of automobile exhaust fumes in downtown Cairo

had an average blood lead level concentration of 29.2 μg/dl. Assume X,

the blood lead level of a randomly selected policeman, is normally

distributed with a standard deviation of σ = 7.5 μg/dl. Historically, it is

known that the average blood lead level concentration of humans with

no exposure to automobile exhaust is 18.2 μg/dl. Is there convincing

evidence that policemen exposed to constant auto exhaust have

10

elevated blood lead level concentrations? (Data source: Kamal,

Eldamaty, and Faris, "Blood lead level of Cairo traffic policemen,"

Science of the Total Environment, 105(1991): 165-170.)

Solution. Let's try to answer the question by calculating a 95%

confidence interval for the population mean. For a 95% confidence

interval, 1−α = 0.95, so that α = 0.05 and α/2 = 0.025. Therefore, as the

following diagram illustrates the situation, z0.025 = 1.96:

Now, substituting in what we know ( = 29.2, n = 126, σ = 7.5, and z0.025 =

1.96) into the the formula for a Z-interval for a mean, we get:

[

√ ] [

√ ]

Simplifying, we get a 95% confidence interval for the mean blood lead

level concentration of all policemen exposed to constant auto exhaust:

[27.89, 30.51]

That is, we can be 95% confident that the mean blood lead level

concentration of all policemen exposed to constant auto exhaust is

11

between 27.9 μg/dl and 30.5 μg/dl. Note that the interval does not

contain the value 18.2, the average blood lead level concentration of

humans with no exposure to automobile exhaust. In fact, all of the

values in the confidence interval are much greater than 18.2. Therefore,

there is convincing evidence that policemen exposed to constant auto

exhaust have elevated blood lead level concentrations.

Example 8. (Large sample CI for population mean,

variance unknown) (BU)

Descriptive statistics on variables measured in a sample of a

n=3,539 participants attending the 7th examination of the offspring

in the Framingham Heart Study are shown below.

Characteristic n Sample

Mean

Standard

Deviation (s)

Systolic Blood

Pressure

3,534 127.3 19.0

Diastolic Blood

Pressure

3,532 74.0 9.9

Total Serum

Cholesterol

3,310 200.3 36.8

Weight 3,506 174.4 38.7

Height 3,326 65.957 3.749

Body Mass Index 3,326 28.15 5.32

Because the sample is large, we can generate a 95% confidence

interval for systolic blood pressure using the following formula:

[

√ ]

12

Substituting the sample statistics and the Z value for 95%

confidence, , we have

[

√ ] [ ]

Therefore, the point estimate for the true mean systolic blood

pressure in the population is 127.3, and we are 95% confident that

the true mean is between 126.7 and 127.9. The margin of error is

very small (the confidence interval is narrow), because the sample

size is large.

Example 9. (Small sample CI for population mean,

variance unknown, NORMAL POPULATION) (BU)

The table below shows data on a subsample of n=10 participants

in the 7th examination of the Framingham offspring Study.

Characteristic n Sample

Mean

Standard

Deviation (s)

Systolic Blood

Pressure

10 121.2 11.1

Diastolic Blood

Pressure

10 71.3 7.2

Total Serum

Cholesterol

10 202.3 37.7

Weight 10 176.0 33.0

Height 10 67.175 4.205

Body Mass Index 10 27.26 3.10

Suppose we compute a 95% confidence interval for the true

systolic blood pressure using data in the subsample. Because the

sample size is small, we must now use the confidence interval

formula that involves t rather than Z.

[

√ ]

13

The sample size is n=10, the degrees of freedom (df) = n-1 = 9.

The t value for 95% confidence with df = 9 is = 2.262.

Substituting the sample statistics and the t value for 95%

confidence, we have

.

Interpretation: Based on this sample of size n=10, our

best estimate of the true mean systolic blood pressure in the

population is 121.2. Based on this sample, we are 95% confident

that the true systolic blood pressure in the population is between

113.3 and 129.1. Note that the margin of error is larger here

primarily due to the small sample size.