, are some parameters of the population. In general, , are not known.Suppose we want to know (say), we take samples and we will know and s So,

PopulationSampleSize N nMean XS.D. s

, are some parameters of the population.In general, , are not known.Suppose we want to know (say),we take samples and we will knowX and s So, what can we say about ?Can we say X is ?

Can we say X is close to ?

But how close is it?

To estimate by a number Xit is too “dangerous”!It is much “safer” to estimate by an interval.Based on the data from random samples,we can have sample mean and variance;suppose by some further calculation,we can find an interval (L,U),

such that P(L< < U) = 95 % (say),

that means there’s 95% chance (L,U) traps .

We say (L,U) is a 95% confidence interval for

In general, to estimate a parameter , if we can find a random interval (L,U) such that P(L < < U) = k%,(L,U) is called a k% confidence interval for

But how to find(L,U)?

In AL, you are required to construct confidence interval C.I. for (1) population mean and (2) population proportion.

Let’s talk about C.I. for .By CLT, X ),(~

2

nN

Task: Find 95% C.I. for .

Suppose (L,U) is a 95% C.I. for , P(L < < U) = 95% --- (1)By table, P(1.96 < z < 1.96) = 95%

%95)96.1/

96.1( n

XP

Rearranging, %95)96.196.1

( n

Xn

XP

Comparing (1),95% C.I. for is

nX

nX

96.1,

96.1

nX

nX

96.1,

96.1 is a 95% C.I. for .

How about a 99% C.I. for ? Ans:

nX

nX

58.2,

58.2

since P(2.58 < z < 2.58) = 99%

In general, a % C.I. for is

nz

Xn

zX cc

, where P(zc < z < zc) = %

% is called the confidence level.

nz

Xn

zX cc

, is a % C.I. for .

Note 1: zc , hence width of C.I. Reasonable! To ensure more chance to “trap”the true , we can have wider width of C.I.But it is close to meaningless to mention C.I.of large range, e.g. if we claim that we have 100% confident that the true lies on (,).

Note 2: In practice, we don’t even know ,then we should use sample s.d. s to replace .More precisely, use s[n/(n1)] instead of s.

E.g. 26Masses of random sample (in g) are 182,184, 176, 178, 181, 180, 183, 178, 179, 177,180, 183, 179, 178, 181, 181. If this samplecame from a normal population = 10g, obtain a 95% C.I. for mean mass of thepopulation.

180XFor the sample, Hence 95% C.I. for is

1610

96.1180,16

1096.1180

= (175.1, 184.9)

In previous question, (175.1,184.9) is a 95%C.I. for the true mean . Am I right in sayingthat there is 95% chance that lies in (175.1,184.9) ?

Note 1: is NOT a random variable! While,the interval (L,U) is a random interval.Note 2: We can just say that we are 95% confident that lies on (L,U).

How to comprehend this ?

Population

Sample 1

1X (L1,U1)Sample 2

2X (L2,U2)...

Sample n

nX (Ln,Un)

...

If (L1,U1), (L2,U2) , …, (Ln,Un) are 95% C.I.then there should be 95% of theses intervals(L1,U1), (L2,U2) , …, (Ln,Un) includes the true mean .

X

For 20 95%C.I.

So (175.1,184.9) isjust one of the C.I.sand it may or may not trap .

there should be19 C.I. trap the true mean.

An example.

Suppose {X1, X2,…, X7} = population set.

We take 2-element samples. (n = 2)Total possible way = 7C2 = 21

Hence we can construct 21 different C.I.s

We consider the 90% C.I. XX sXsX 654.1,654.1

See the WORDS document now.

We know 21 C.I.s, 19 of them do trap .

Please notice that 2190% 19Also, the sample size = 2, too small!Instead of using

nsX

We use the adjusted sample s.d..

nNnN

sX

1

Refer to P.81 note (ii) in text book.

E.g. 27A certain population, = 6. How large a sample size => width of 95% C.I. for = 0.5

25.096.1 n

Half width = 0.25

nX

nX

96.1,

96.195%C.I.=

25.0696.1 n n = 2209

Do you agree?

If is known, C.I. is

nz

Xn

zX cc

,

If is unknown, C.I. is

nsz

Xnsz

X cc ,

Precisely,

ns

nn

zXn

sn

nzX cc 1

,1

E.g. 28A sample of 100 plugs with mean diameter25.10 cm. If s.d. of these plugs is 0.12, estimate the population mean diameter at 95% confident level.

Now, we don’t know , so use sample s.d. s

ns

nn

zXn

sn

nzX cc 1

,1

10012.0

99100

96.110.252,10012.0

99100

96.110.252 = (25.076,25.124)

E.g. 31(a) A two-stage rocket to be fired to put a satellite into orbit. Due to variation of the specified impulse in the second stage, the velocity imparted in this stage will benormally distributed about 4095 ms1 with s.d. 21 ms1Find 95% confident limits for the velocity imparted inthis stage.

4095v 211 s

ns

vns

v 11 96.1,

96.195% C.I. =

19621.1

4095,1

9621.14095=

= (4054 , 4126)

(b) In the first stage, the velocity imparted will be normally distributed about 3990 ms1 with s.d. 20 ms1due to variation of the specific impulse and (independently) with s.d. 8 ms1 due to variation in the time of burning of the change. Find 90% confident limits for the velocity imparted in this stage.

s2 = 20, s3 = 8222

32

2 820 ssCombined s.d. = = 21.54

90% C.I. =

ns

vn

sv 2

12

1

645.1,

645.1

= (39901.64521.54, 3990+1.64521.54)

= (3955,4025)

(c) Given that the final velocity of 8000 ms1 is required to go into orbit and that the second stage fires immediately after the first, find the probability of achieving orbit.

v = 4095s2 = 212

v1 = 3990s1

2 = 202+ 82

Let V = final velocityE(V) = 3990 + 4095Var(V) = 202 + 82 + 212

= 8085= 905

V ~ N(8085,905)

P(V > 8000) = )905

80858000(

zP

= 0.9977

Prerequisite on E.g. 32

Uniform distribution

a b

f(x)

r

xab

r

1

b

axdx

abXE

1)(

2ba

)()()( 22 XEXEXVar

b

a

badxx

ab

22

21

12

2ba

E.g. 32To add 104 numbers, each of which was rounded off with accuracy 10m degree. Assuming that the errors arising mutually independent and uniformly distributed on (0.510m, 0.510m), find the limits in which the total error will lie with probability 0.99.Let X = total error. X = X1 + X2 +…+ X10000

Since Xi is uniformly distributed,

2105.0105.0

)(mm

iXE = 0

12

105.0105.0)(

2mm

iXVar

1210 2m

)(10000)( iXEXE = 0)(10000)( iXVarXVar

1210 )2(2

m

By CLT, )12

10,0(~

)2(2 m

NXBy table,

P(2.56 < z < 2.56) = 0.99

99.0)56.2

1210

056.2( )2(

m

XP

)2(101256.2 mHence the limits are

Hence we can construct the 99% C.I. for total error X and this estimation is far more better! Let’s use m = 3 as an example. |X| 0.0005104

= 5, too large for estimation! But the C.I. is (0.0739,0.0739) only, more “precise”.

Now, let’s talk about C.I. for proportion

Suppose you want to look into the smoker’s

proportion in H.K.

You have interviewed with 100 H.K. people and

discovered 60 smokers.

Can we say the smokers’ proportion of H.K. people

is 60% ?

However, we canconstruct a C.I.

to estimate the trueproportion!

Let n be the sample size.Let m be the number of “success”

(i.e. “smokers” in the e.g.)Let p be the true proportion (of “success”)

Suppose the population is very large,

then m has a binomial distribution such that

m ~ B(n , p)

Suppose further that n is reasonably large.

We can use “normal” to approximate “binomial”.

m ~ N(np , npq)

Let Ps be the proportion on “success” in sample.

nm

Ps )()(nm

EPE s pn

np

)()(nm

VarPVar s 22

)(n

npqn

mVar npq

Hence ),(npq

pNPs ~

In practice, p is unknown. We use Ps Qs/n to estimate pq/n.

Thus Ps ~ ),(nQP

pN ss approximately

Hence

95.0)96.196.1(

nQPpP

Pss

s

Rearranging,

95.0)96.196.1( nQP

PpnQP

PP sss

sss

Hence 95% C.I. for population proportion p is

nQP

PnQP

P sss

sss 96.1,96.1

In general, % C.I. for population proportion p is

nQP

zPnQP

zP sscs

sscs ,

where P(zc < z < zc) = %

n > 30 is required.

E.g. 344000 items, 240 defective, find 95% C.I. for the probability p that an item is defective.

Ps = 4000240

= 0.06 Qs = 1 0.06 = 0.94

400094.006.0

nQP ss = 0.00375

00375.096.106.0,00375.096.106.0

Required 95% C.I. is

= (0.0526 , 0.00674)

E.g. 35Suppose that we know p = 0.6 for a Bernoulli population. How large is the size is necessary to be 95% confident that the obtained value p lies in (0.5,0.7) ?

(0.5,0.7) = (0.60.1,0.6+0.1)

Let n = sample size.

Hence, for 95% confidence,

n4.06.0

96.1

0.1 =

On solving,

n 92

E.g. 37(a) Of 50 houseflies, independently subjected to the same insecticide, 38 were killed. Obtain an estimate of p, the probability that a housefly is killed by the insecticide. Findalso the standard error of p.

Ps = 5038

2519

Standard error =nQP ss

0604.050

)25/191)(25/19(

(b) Now conduct a larger experiment with the same insecticide so that an estimate with standard error of about 0.03 can be quoted. On the basis of the information in the experiment already conducted, how many houseflies needed ?

Standard error =nQP ss 03.0

)25/6)(25/19( n

So n = 203(c) To be absolutely sure of obtaining the desired accuracy, how many houseflies should be taken ?

Standard error depends on Ps. n = 203 makes standard error = 0.03 only when Ps = 19/25.

So what n to ensure s.e. 0.03 irrespective of Ps ?

For fixed n, s.e. is a function of Ps.

s.e. 0.03 means max. of s.e. = 0.03.

nPP

es ss )1(..

Very easy to show that Ps(1-Ps) attains max. when p = 0.5

Hence s.e. n

)5.01(5.0 n4

1

Then set 03.041 n

n 279

i.e. Though different samples yield different Ps, it is sure that s.e. not greater than 0.03 if we take n = 279 (or more)

Documents

, are some parameters of the population. In general, , are not known.Suppose we want to know (say), we take samples and we will know and s So,