32
PopulationSample Size N n Mean X S.D. s re some parameters of the popul eneral, , are not known. pose we want to know (say), ake samples and we will know X and s So, what can we say about ? Can we say X is ? Can we say X is close to ? But how close is it?

, are some parameters of the population. In general, , are not known.Suppose we want to know (say), we take samples and we will know and s So,

Embed Size (px)

Citation preview

Page 1: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

PopulationSampleSize N nMean XS.D. s

, are some parameters of the population.In general, , are not known.Suppose we want to know (say),we take samples and we will knowX and s So, what can we say about ?Can we say X is ?

Can we say X is close to ?

But how close is it?

Page 2: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

To estimate by a number Xit is too “dangerous”!It is much “safer” to estimate by an interval.Based on the data from random samples,we can have sample mean and variance;suppose by some further calculation,we can find an interval (L,U),

such that P(L< < U) = 95 % (say),

that means there’s 95% chance (L,U) traps .

We say (L,U) is a 95% confidence interval for

Page 3: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

In general, to estimate a parameter , if we can find a random interval (L,U) such that P(L < < U) = k%,(L,U) is called a k% confidence interval for

But how to find(L,U)?

In AL, you are required to construct confidence interval C.I. for (1) population mean and (2) population proportion.

Page 4: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

Let’s talk about C.I. for .By CLT, X ),(~

2

nN

Task: Find 95% C.I. for .

Suppose (L,U) is a 95% C.I. for , P(L < < U) = 95% --- (1)By table, P(1.96 < z < 1.96) = 95%

%95)96.1/

96.1( n

XP

Rearranging, %95)96.196.1

( n

Xn

XP

Comparing (1),95% C.I. for is

nX

nX

96.1,

96.1

Page 5: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

nX

nX

96.1,

96.1 is a 95% C.I. for .

How about a 99% C.I. for ? Ans:

nX

nX

58.2,

58.2

since P(2.58 < z < 2.58) = 99%

In general, a % C.I. for is

nz

Xn

zX cc

, where P(zc < z < zc) = %

% is called the confidence level.

Page 6: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

nz

Xn

zX cc

, is a % C.I. for .

Note 1: zc , hence width of C.I. Reasonable! To ensure more chance to “trap”the true , we can have wider width of C.I.But it is close to meaningless to mention C.I.of large range, e.g. if we claim that we have 100% confident that the true lies on (,).

Note 2: In practice, we don’t even know ,then we should use sample s.d. s to replace .More precisely, use s[n/(n1)] instead of s.

Page 7: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

E.g. 26Masses of random sample (in g) are 182,184, 176, 178, 181, 180, 183, 178, 179, 177,180, 183, 179, 178, 181, 181. If this samplecame from a normal population = 10g, obtain a 95% C.I. for mean mass of thepopulation.

180XFor the sample, Hence 95% C.I. for is

1610

96.1180,16

1096.1180

= (175.1, 184.9)

Page 8: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

In previous question, (175.1,184.9) is a 95%C.I. for the true mean . Am I right in sayingthat there is 95% chance that lies in (175.1,184.9) ?

Note 1: is NOT a random variable! While,the interval (L,U) is a random interval.Note 2: We can just say that we are 95% confident that lies on (L,U).

How to comprehend this ?

Page 9: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

Population

Sample 1

1X (L1,U1)Sample 2

2X (L2,U2)...

Sample n

nX (Ln,Un)

...

Page 10: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

If (L1,U1), (L2,U2) , …, (Ln,Un) are 95% C.I.then there should be 95% of theses intervals(L1,U1), (L2,U2) , …, (Ln,Un) includes the true mean .

X

For 20 95%C.I.

So (175.1,184.9) isjust one of the C.I.sand it may or may not trap .

there should be19 C.I. trap the true mean.

Page 11: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

An example.

Suppose {X1, X2,…, X7} = population set.

We take 2-element samples. (n = 2)Total possible way = 7C2 = 21

Hence we can construct 21 different C.I.s

We consider the 90% C.I. XX sXsX 654.1,654.1

See the WORDS document now.

Page 12: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

We know 21 C.I.s, 19 of them do trap .

Please notice that 2190% 19Also, the sample size = 2, too small!Instead of using

nsX

We use the adjusted sample s.d..

nNnN

sX

1

Refer to P.81 note (ii) in text book.

Page 13: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

E.g. 27A certain population, = 6. How large a sample size => width of 95% C.I. for = 0.5

25.096.1 n

Half width = 0.25

nX

nX

96.1,

96.195%C.I.=

25.0696.1 n n = 2209

Page 14: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

Do you agree?

Page 15: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

If is known, C.I. is

nz

Xn

zX cc

,

If is unknown, C.I. is

nsz

Xnsz

X cc ,

Precisely,

ns

nn

zXn

sn

nzX cc 1

,1

Page 16: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

E.g. 28A sample of 100 plugs with mean diameter25.10 cm. If s.d. of these plugs is 0.12, estimate the population mean diameter at 95% confident level.

Now, we don’t know , so use sample s.d. s

ns

nn

zXn

sn

nzX cc 1

,1

10012.0

99100

96.110.252,10012.0

99100

96.110.252 = (25.076,25.124)

Page 17: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

E.g. 31(a) A two-stage rocket to be fired to put a satellite into orbit. Due to variation of the specified impulse in the second stage, the velocity imparted in this stage will benormally distributed about 4095 ms1 with s.d. 21 ms1Find 95% confident limits for the velocity imparted inthis stage.

4095v 211 s

ns

vns

v 11 96.1,

96.195% C.I. =

19621.1

4095,1

9621.14095=

= (4054 , 4126)

Page 18: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

(b) In the first stage, the velocity imparted will be normally distributed about 3990 ms1 with s.d. 20 ms1due to variation of the specific impulse and (independently) with s.d. 8 ms1 due to variation in the time of burning of the change. Find 90% confident limits for the velocity imparted in this stage.

s2 = 20, s3 = 8222

32

2 820 ssCombined s.d. = = 21.54

90% C.I. =

ns

vn

sv 2

12

1

645.1,

645.1

= (39901.64521.54, 3990+1.64521.54)

= (3955,4025)

Page 19: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

(c) Given that the final velocity of 8000 ms1 is required to go into orbit and that the second stage fires immediately after the first, find the probability of achieving orbit.

v = 4095s2 = 212

v1 = 3990s1

2 = 202+ 82

Let V = final velocityE(V) = 3990 + 4095Var(V) = 202 + 82 + 212

= 8085= 905

V ~ N(8085,905)

P(V > 8000) = )905

80858000(

zP

= 0.9977

Page 20: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

Prerequisite on E.g. 32

Uniform distribution

a b

f(x)

r

xab

r

1

b

axdx

abXE

1)(

2ba

)()()( 22 XEXEXVar

b

a

badxx

ab

22

21

12

2ba

Page 21: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

E.g. 32To add 104 numbers, each of which was rounded off with accuracy 10m degree. Assuming that the errors arising mutually independent and uniformly distributed on (0.510m, 0.510m), find the limits in which the total error will lie with probability 0.99.Let X = total error. X = X1 + X2 +…+ X10000

Since Xi is uniformly distributed,

2105.0105.0

)(mm

iXE = 0

12

105.0105.0)(

2mm

iXVar

1210 2m

Page 22: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

)(10000)( iXEXE = 0)(10000)( iXVarXVar

1210 )2(2

m

By CLT, )12

10,0(~

)2(2 m

NXBy table,

P(2.56 < z < 2.56) = 0.99

99.0)56.2

1210

056.2( )2(

m

XP

)2(101256.2 mHence the limits are

Hence we can construct the 99% C.I. for total error X and this estimation is far more better! Let’s use m = 3 as an example. |X| 0.0005104

= 5, too large for estimation! But the C.I. is (0.0739,0.0739) only, more “precise”.

Page 23: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

Now, let’s talk about C.I. for proportion

Suppose you want to look into the smoker’s

proportion in H.K.

You have interviewed with 100 H.K. people and

discovered 60 smokers.

Can we say the smokers’ proportion of H.K. people

is 60% ?

However, we canconstruct a C.I.

to estimate the trueproportion!

Page 24: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

Let n be the sample size.Let m be the number of “success”

(i.e. “smokers” in the e.g.)Let p be the true proportion (of “success”)

Suppose the population is very large,

then m has a binomial distribution such that

m ~ B(n , p)

Suppose further that n is reasonably large.

We can use “normal” to approximate “binomial”.

m ~ N(np , npq)

Page 25: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

Let Ps be the proportion on “success” in sample.

nm

Ps )()(nm

EPE s pn

np

)()(nm

VarPVar s 22

)(n

npqn

mVar npq

Hence ),(npq

pNPs ~

In practice, p is unknown. We use Ps Qs/n to estimate pq/n.

Thus Ps ~ ),(nQP

pN ss approximately

Page 26: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

Hence

95.0)96.196.1(

nQPpP

Pss

s

Rearranging,

95.0)96.196.1( nQP

PpnQP

PP sss

sss

Hence 95% C.I. for population proportion p is

nQP

PnQP

P sss

sss 96.1,96.1

Page 27: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

In general, % C.I. for population proportion p is

nQP

zPnQP

zP sscs

sscs ,

where P(zc < z < zc) = %

n > 30 is required.

Page 28: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

E.g. 344000 items, 240 defective, find 95% C.I. for the probability p that an item is defective.

Ps = 4000240

= 0.06 Qs = 1 0.06 = 0.94

400094.006.0

nQP ss = 0.00375

00375.096.106.0,00375.096.106.0

Required 95% C.I. is

= (0.0526 , 0.00674)

Page 29: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

E.g. 35Suppose that we know p = 0.6 for a Bernoulli population. How large is the size is necessary to be 95% confident that the obtained value p lies in (0.5,0.7) ?

(0.5,0.7) = (0.60.1,0.6+0.1)

Let n = sample size.

Hence, for 95% confidence,

n4.06.0

96.1

0.1 =

On solving,

n 92

Page 30: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

E.g. 37(a) Of 50 houseflies, independently subjected to the same insecticide, 38 were killed. Obtain an estimate of p, the probability that a housefly is killed by the insecticide. Findalso the standard error of p.

Ps = 5038

2519

Standard error =nQP ss

0604.050

)25/191)(25/19(

Page 31: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

(b) Now conduct a larger experiment with the same insecticide so that an estimate with standard error of about 0.03 can be quoted. On the basis of the information in the experiment already conducted, how many houseflies needed ?

Standard error =nQP ss 03.0

)25/6)(25/19( n

So n = 203(c) To be absolutely sure of obtaining the desired accuracy, how many houseflies should be taken ?

Standard error depends on Ps. n = 203 makes standard error = 0.03 only when Ps = 19/25.

So what n to ensure s.e. 0.03 irrespective of Ps ?

Page 32: ,  are some parameters of the population. In general, ,  are not known.Suppose we want to know  (say), we take samples and we will know and s So,

For fixed n, s.e. is a function of Ps.

s.e. 0.03 means max. of s.e. = 0.03.

nPP

es ss )1(..

Very easy to show that Ps(1-Ps) attains max. when p = 0.5

Hence s.e. n

)5.01(5.0 n4

1

Then set 03.041 n

n 279

i.e. Though different samples yield different Ps, it is sure that s.e. not greater than 0.03 if we take n = 279 (or more)