Download pdf - 5620HHa14

Random Samples

X1, . . . ,Xn - i.i.d variables (independent, identically distributed)or a random sample observations independently selectedfrom the same population, or resulting from analogousstatistical experiments.Definition A statistic is any function of observations from arandom sample that does not involve population parameters.Distributions of statistics are called sampling distributions.Examples:X = 1n (X1 + + Xn),S2 = 1n1

ni=1(Xi X )2,min(X1, . . . ,Xn)

Statistics are random variables

Theorem Let X1, . . . ,Xn be a random sample from adistribution with E(Xi) = and Var(Xi) = 2. ThenE(X ) = ,Var(X ) =

2

n and E(X1 + + Xn) = n,Var(X1 + + Xn) = n2.

1 / 1

Distributions of selected statistics in random samplesfrom normal populations

Theorem Let X1, . . . ,Xn be a random sample from a N(, 2)distribution. Statistics X and U = n1

2S2 = 1

2

i(Xi X )2 have

distributions N(, 2

n ) and 2n1, respectively.

Theorem If X1, . . . ,Xn is a random sample from N(, 2)distribution, then variables X and U are independent. Also, ifvariables X and U are independent then sample X1, . . . ,Xn isselected from a normally distributed population.

2 / 1

Definition If independent variables Z and U have distributionsN(0, 1) and 2 , respectively, then variable X =

ZU/

has

Students t distribution with degrees of freedom. The density

of variable X is f (x ; ) = ((+1)/2)(/2)1pi

(1 + x2

)(+1)/2.

Theorem Students tn distribution approaches N(0, 1) when thenumber n of degrees of freedom increases.

Definition If independent variables U and V havedistributions 21 and

22

, respectively, then X = U/1V/2 has Fdistribution with 1 and 2 degrees of freedom, F (1, 2).

3 / 1

Problem 10.2.2 p. 318 X F(1, 2). SinceP(X < x) = = P( 1X >

1x ), then P(

1X

F (2,1)

< 1x ) = 1 .

Problem 10.2.3 p. 318 X N(0, 1), Y N(1, 1), W N(2, 4).P( X

2+(Y1)2X2+(Y1)2+(W2)2/4 > k) = P(

11+ (W2)

2/4X2+(Y1)2

> k) =

Since 11+a > k is equivalent to 1 > k + ak and to a t)= 1P(X1 > t , . . . ,Xn > t) = 1[P(Xi > t)]n = 1 [1 F (t)]n.

g1(t) = ddtG1(t) = n[1 F (t)]n1(1) ddtF (t)= nf (t)[1 F (t)]n1k = n Gn(t) =

nr=n(nr

)[F (t)]r [1 F (t)]nr

=(nn

)[F (t)]n[1 F (t)]0 = [F (t)]n.

Also, Gn(t) = P(Xn:n t) = P(X1 t , . . . ,Xn t)= [P(Xi t)]n = [F (t)]n. Finally, gn(t) = nf (t)[F (t)]n1

gk (t) = k(nk

)f (t)[F (t)]k1[1 F (t)]nk

7 / 1

Example Xi EXP(1), f (x) = ex for x > 0 and 0 otherwise,F (t) = 1 ex for x > 0 and 0 otherwise.G1(t) = 1 [1 (1 et)]n = 1 ent , g1(t) = nentGn(t) = (1 et)n, and gn(t) = net(1 et)n1.

Example Let X1, . . . ,Xn be a random sample selected fromU[0, ] distribution.Find:(i) Densities g1(t) and gn(t) of X1:n and Xn:n, respectively.(ii) cdfs G1(t) and Gn(t).(iii) Means and variances.(iv) Covariance of X1:n and Xn:n.

8 / 1

(i) Xi U[0, ], f (x) = 1 for 0 < x < and 0 otherwise,

F (x) =

0 for x < 0x for 0 < x < 1 for x >

g1(t) = n (1 t )n1, gn(t) = n ( t )n1(ii) G1(t) = 1 [1 t ]n, Gn(t) = ( t )n.

9 / 1

(iii) E(X1:n) =

0 t n (1 t )n1dt = t = w then dt = dw , and =

10 w

n (1 w)n1dw

= n 1

0 w1(1 w)n1 BETA(2,n)

dw

= (n1)!(n+1)!n 1

0

(n + 2)(2)(n)

w1(1 w)n1dw =1

= n!(n+1)! =

n+1 .

E(Xn:n) =

0 t n ( t )n1dt = 1nn

0 tndt = nn

tn+1n+1 |0 = n

n+1

(n+1)n

= nn+1.

10 / 1

E(X 21:n) =

0 t2g1(t) =

0 t

2 n (1 t )n1dt = 1 t = u, t = (1 u), dt = duand = 01 2(1 u)2 nun1du = n2 10 un1(1 u)2

BETA(n,3)

du

= n2 (n)(3)(n+3)

10

(n + 3)(n)(3)

un1(1 u)2du =1

= 2 n(n1)!2(n+2)!

= 2 2n!(n+2)! =22

(n+1)(n+2)

11 / 1

Finally, Var(X1:n) = E(X 21:n) (E(X1:n))2 = 22

(n+1)(n+2) ( n+1)2

= 2( 2(n+1)(n+2) 1(n+1)2 ) = 2

(n+1)2(n+2) [2(n + 1) (n + 2)]

= n2

(n+1)2(n+2)

Var(Xn:n) :

E(X 2n:n) =

0 t2gn(t)dt =

0 t

2 n ( t )n1dt = nn

0 tn+1dt

= nntn+2n+2 |0 = n

2

n+2 .

Now, Var(Xn:n) = E(X 2n:n) (E(Xn:n))2 = n2

n+2 ( nn+1)2= n2( 1n+2 n(n+1)2 ) = n

2

(n+2)(n+1)2 [(n + 1)2 n(n + 2)]

=1

= n2

(n+2)(n+1)2 . Notice that Var(X1:n) = Var(Xn:n).

12 / 1

(iv) The joint density of k th and l th order statistics (l > k) isgiven by the formula gk ,l(s, t) =

n!(k1)!(lk1)!(nl)! [F (s)]

k1[F (t)F (s)]lk1[1F (t)]nl f (s)f (t)for s t , and 0 otherwise.For k = 1 and l = n :g1,n(s, t) = n!0!(n2)!0! [F (s)]

0[F (t) F (s)]n2[1 F (t)]0f (s)f (t)= n(n 1)[F (t) F (s)]n2f (s)f (t).

For X1, . . . ,Xn selected from U[0, ] distribution

g1,n(s, t) = n(n 1)[ t s ]n2 1 1 = n(n1)n (t s)n2,0 < s < t < .

13 / 1

E(X1:nXn:n) =

0

t0 st g1,n(s, t)ds dt

=

0

t0 st

n(n1)n (t s)n2ds dt

=

0 t n(n1)n t

0[t (t s)](t s)n2ds

=

dt =

= t

0 [t(t s)n2 (t s)n1]ds = ( t(ts)n1

n1 +(ts)n

n )|t0= t

n

n1 tn

n = tn( 1n1 1n ) = tn 1n(n1) .

Now = 0 t n(n1)n tnn(n1) dt = 1n 0 tn+1dt = 1n tn+2n+2 = 2n+2 ,and finally, Cov(X1:n,Xn:n) =

2

n+2 n+1 nn+1=

2

(n+1)2(n+2) [(n + 1)2 n(n + 2)

=1

] = 2

(n+1)2(n+2)

14 / 1

Distribution of the sample range R = Xn:n X1:nR = Xn:nX1:n = T S, and let companion variable be W = T .The solutions are t = w , and s = t r = w r . SinceJ = |11 01| = 1, |J| = 1. Now g(w , r) = f1,n(s(w , r), t(w , r))= n(n 1)[F (w) F (w r)]n2f (w r)f (w) and the density ofR can be obtained as hR(r) =

g(w , r)dw

Example X1 . . . ,Xn EXP(1), f (x) = ex for x > 0 and 0otherwise, F (x) = 1 ex for x > 0 and 0 otherwise. ThenhR(r) =

r n(n 1)[ew+r ew ]n2e(wr)ewdw

= er (er 1)n2n(n 1) r

ew(n2)2wdw =

15 / 1

=r e

nwdw = 1nenw |r = 1nenr , and finallyhR(t) = (n 1)er (er 1)n2enr = (n 1)er(n1)(er 1)n2= (n 1)er (er (er 1))n2 = (n 1)er (1 er )n2, r > 0.

Example Let X1, . . . ,Xn be a random sample selected from aU[0, ] distribution. Determine the sample size n needed for theexpected sample range E(Xn:n X1:n) to be at least 0.75.E(R) = E(Xn:n X1:n) = nn+1 n+1 = n1n+1. If nown1n+1 0.75 then 4(n 1) 3(n + 1) and n 7.

Var(R) = Var(Xn:n X1:n) = Var(Xn:n) + Var(X1:n)2Cov(X1:n,Xn:n) = 2n2(n+1)2(n+2) 2

2

(n+1)2(n+2) =22(n1)

(n+1)2(n+2)

16 / 1

Determine Var(Xl:n Xk :n), l > k , in a sample from U[0, ].Var(Xl:n Xk :n) = Var(Xk :n) + Var(Xl:n) 2Cov(Xk :n,Xl:n).Needed: (i) E(Xk :n), (ii) E(X 2k :n), (iii) Var(Xk :n),and (iv) Cov(Xk :n,Xl:n)

gk (t) = n!(k1)!(nk)! [F (t)]k1(1 F (t)]nk f (t)

= n!(k1)!(nk)! (t )

k1[1 t ]nk 1

17 / 1

(i) E(Xk :n) =

0 tgk (t)dt =

0 tn!

(k1)!(nk)! (t )

k1[1 t ]nk 1dt= n!(k1)!(nk)!

10 ww

k1(1 w)nk 1 dw

= n!(k1)!(nk)!

10wk (1 w)nkdw BETA(k+1,nk+1)

= (k+1)(nk+1)(n+2)n!

(k1)!(nk)!

= k!(nk)!(n+1)! n!(k1)!(nk)! = kn+1 , and E(Xl:n) = ln+1 .(ii) Similarly,E(X 2k :n) =

0 t

2gk (t)dt =

0 t2 n!

(k1)!(nk)! (t )

k1[1 t ]nk 1dt

= 2n!

(k1)!(nk)!

10wk+1(1 w)nkdw BETA(k+2,nk+1)

= (k+2)(nk+1)(n+3) 2n!

(k1)!(nk)! = 2 k(k+1)

(n+1)(n+2) .

18 / 1

(iii) Var(Xk :n) = E(X 2k :n) [E(Xk :n)]2 = 2 k(k+1)(n+1)(n+2) ( kn+1)2

= 2 k(k+1)(n+1)k2(n+2)

(n+1)2(n+2) = 2 k(n+1k)

(n+1)2(n+2) .

(iv) gk ,l(s, t) = n!(k1)!(lk1)!(nl)![F (s)]k1[F (t) F (s)]lk1[1 F (t)]nl f (s)f (t)ds dt .Now E(Xk :nXl:n) =

0

t0 st gk ,l(s, t)

=

0

t0 st(

s )

k1[ t s ]lk1[1 t ]nl 12ds dt =???

Easier way: Let Y1, . . . ,Yn be a random sample from U[0, 1].Then Xi = Yi ,Xi:n = Yi:n,E(Xi:n) = E(Yi:n) and so on. AlsoE(Xk :nXl:n) = 2E(Yk :nYl:n).

19 / 1

For = n!(k1)!(lk1)!(nl)! ,E(Yk :nYl:n) =

0

t0 st gk ,l(s, t)ds dt

=

0

t0 st sk1[t s]lk1[1 t ]nl ds dt

=

0

t0 [1 (1 t)] sk [t s]lk1[1 t ]nl ds dt

=

0

t0 s

k [t s]lk1[1 t ]nl ds dt 0 t0 sk [t s]lk1[1 t ]nl+1 ds dt = AB,

and since 1 = (n+1)!k!(lk1)!(nl)! ,

A = 1

0

t0 1s

k [t s]lk1[1 t ]nl gk+1,l+1(s,t)

ds dt

= k!(lk1)!(nl)!(n+1)! .

20 / 1

Similarly B = 2

0

t0 2s

k [t s]lk1[1 t ]nl+1 gk+1,l+1(s,t)

ds dt

= k!(lk1)!(nl+1)!(n+2)! , since 2 =(n+2)!

k!(lk1)!(nl+1)! .

Next,E(Yk :nYl:n) = A B = (k!(lk1)!(nl)!(n+1)! k!(lk1)!(nl+1)!(n+2)! )= n!(k1)!(lk1)!(nl)! k!(lk1)!(nl)!(n+2)! [(n + 2) (n l + 1)]= k(l+1)(n+1)(n+2) .

Consequently, E(Xk :nXl:n) = 2k(l+1)

(n+1)(n+2)

and Cov(Xk :n,Xl:n) = E(Xk :nXl:n) E(Xk :n)E(Xl:n)= 2 k(l+1)(n+1)(n+2) kn+1 ln+1 = 2 k(l+1)(n+1)kl(n+2)(n+1)2(n+2)= 2 k(n+1l)

(n+1)2(n+2) .

21 / 1

Finally, Var(Xl:n Xk :n) = Var(Xk :n) +Var(Xl:n) 2Cov(Xk :n,Xl:n)= 2 k(n+1k)

(n+1)2(n+2) + 2 l(n+1l)

(n+1)2(n+2) 22k(n+1l)

(n+1)2(n+2)

= 2 (lk)(n+1l+k)(n+1)2(n+2)

Joint distribution of X1:n, . . . ,Xn:n: The density of the jointdistribution is g(y1, . . . , yn) = n!f (y1) f (yn) for y1 < < ynand 0 otherwise.

HW 10.3.2 p. 325

22 / 1

Problem 10.3.6 p. 325 The density of Xi BETA(2, 1) isf (x) = (3)(2)(1)x

21(1 x)11 = 2x for 0 < x < 1 and 0 otherw.The joint density (for 0 < y1 < y2 < y3 < y4 < y5 < 1) isg(y1, y2, y3, y4, y5) = 5!2y12y22y32y42y5 = 5!25y1y2y3y4y5.

(i) g1,2,4(y1, y2, y4) = 5!25 1y4

y4y2

y1y2y3y4y5dy3 =

dy5 =

= y4y2

y1y2y3y4y5dy3 = y1y2y4y5 y4y2

y3dy3 =0.5(y24y22 )

= 0.5y1y2y4y5(y24 y22 )

23 / 1

Now = 5!24 1y4 y1y2y4y5(y24 y22 )dy5= 5!24y1y2y4(y24 y22 )

1y4y5dy5

0.5(1y24 )

= 5!23y1y2y4(y24 y22 )(1 y24 ).

24 / 1

(ii) E(X2:5|X4:5) = E(S|T ) = t

0 sf (s|t)ds = t

0 sf (s,t)fT (t)

ds. Since

f (x) = 2x ,F (X ) = x2, the joint density is (n = 5, k = 2, l = 4)

f2,4(s, t) = 5!(21)!(421)!(54)! (s2)1(t2 s2)421(1 t2)12s2t

= 4 5!s3t(t2 s2)(1 t2) for 0 < s < t < 1,and f4(t) = 5!(41)!(54)! (t

2)41(1 t2)542t = 40t7(1 t2).

Now, f (s|t) = 45!s3t(t2s2)(1t2)40t7(1t2) = 12s3t6(t2 s2), andE(X2:5|X4:5) =

t0 s 12s3t6(t2 s2)ds =

t0 12s

4t4ds

t0 12s6t6ds = 125 s5t4|t0 127 s7t6|t0 = 12(15 17)t = 2435 tHW: 10.3.6 p. 325 - find E(X3:5|X4:5)

25 / 1

(iii) Y = X2:5X1:5 =TS . Let W = S be a companion variable, so that

t = yw , and s = w . Since 0 < s < t < 1, we have0 < w < yw < 1, and that means that w > 0, y > 1, and y < 1w .J = |y1w0 | = w , |J| = w .f1,2(s, t) = 5!(11)!(211)!(52)! [F (s)]

11[F (t) F (s)]211[1 F (t)]53f (s)f (t) = 20(1 t2)32s2t = 80st(1 t2)3Now g(w , y) = 80w(wy)(1 y2w2)3w = 80w3y(1 y2w2)3,and gY (y) =

1/y0 g(w , y)dw =

1/y0 80w

3y(1 y2w2)3 dw

= 80y 1/y

0w3(1 y2w2)3 dw

=

=

Let 1 y2w2 = z. Then w2 = 1zy2 ,2wy2dw = dz, and = 01 w

3z3 12wy2dz =12y4 1

0 (1 z)z3dz = 12y4 1

0 (z3 z4)dz

= 12y4(z

4

4 z5

5 )|10dz = 140y4, and gY (y) = = 2y3 for y > 126 / 1

Generating Random SamplesWhen quantitative problems are too complex to be studiedtheoretically one can try to use simulations to obtainapproximate solutions.

Generating U[0, 1] distribution to obtain other discretedistributions such as e.g., Bernoulli, binomial, geometric,negative binomial, and Poisson.Example1: P(X = 1) = p = 1 P(X = 0). Let p = 0.3. Selectany subset of [0, 1] of length 0.3 (p). For example: [0.2, 0.5], or[0.7, 1], or [0,0.1] [0.8,1].Let [0, 0.3] and [0.3, 1] represent a success (S) and a failure(F ), respectively. Five values are generated from a U[0, 1]distribution:0.2117, 0.1385, 0.7009, 0.6990, 0.6903

S S F F F a random sample fromBIN(1, 0.3) distribution

27 / 1

Example 2 BIN(n,p) BIN(6, 0.4).Let [0, 0.6] F and (0.6, 1] S (one of possible choices).1 observation requires n generations from U[0, 1]k observations require n k generations from U[0, 1].For two observations from BIN(6, 0.4) distributions one needs12 generations from U[0, 1]

0.4972 F 0.5957 F0.8125 S 0.4801 F0.3133 F 0.2223 F0.2025 F 0.1718 F0.9335 S 0.2292 F0.0114 F 0.9815 S

X1 = 2 X2 = 1

Random sample of size 2 generated from BIN(6, 0.4)distribution: x1 = 2, x2 = 1.

28 / 1

Example 3 Generate a random sample of size 6 from POI(2)distribution.

X P(X = x) FX (x) = P(X x)0 0.1353 0.13531 0.2707 0.40602 0.2707 0.67673 0.1804 0.85714 0.0902 0.94735 0.0361 0.9834

Xk = i if for k th observation Uk Fx(i 1) Uk < FX (i).Ui 0.0909 0.1850 0.1243 0.2991 0.4290 0.9272Xi 0 1 0 1 2 4

Random sample of size 6 selected from POI(2) distribution is:0, 1, 0, 1, 2, 4.

29 / 1

Theorem If a random variable X has continuous and strictlyincreasing cdf FX then F (X ) has U[0, 1] distribution.

Therefore if Y U[0, 1] then F1X (Y ) has the same distributionas a random variable X . Therefore to generate the distributionof Y one generates the distribution U[0, 1] first, and thentransforms obtained observations by F1X .

Problem 10.4.3 p. 329 The density and the cdf are

f (x) ={

e2x for x < 0e2x for x > 0 and F (x) =

{ 12e

2x

1 12e2x .

Since F (0) = 12 , F1(y) =

{ 12 log 2y for y 1212 log 2(1 y) for y > 12 .

y1 = 0.74492 x1 = 12 log 2(1 0.744921) = 0.336517y2 = 0.464001 x2 = 12 log 2 0.464001 = 0.03736.

HW 10.4.2 p.329

30 / 1

Accept/Reject Algorithm

When the distribution of variable X is such that cdf F and/orF1 do not have closed form, one of the possible methods ofgenerating a random sample from the distribution of X is theso-called accept/reject algorithm:

Let U U[0, 1], and let variable Y with density g be somedistribution that is easy to generate. Variables U and Y areindependent.

Additionally, let c be a constant such that f (y) cg(y) for anyvalue y of Y , so in other words c = supy

f (y)g(y) .

Finally, X = Y if U < f (Y )cg(Y ) .

31 / 1

Justification: It will be shown that FX (y) = FY (y |U f (Y )cg(Y ) ).

FX (y) = FY (y |U f (Y )cg(Y ) ) = P(Yy ,Uf (y)/[cg(y)]P(Uf (y)/[cg(y)] =

P(U f (Y )/[cg(y)]) = f (y)cg(y) g(y)dy = 1c f (y)dy = 1c = c y f (t)/[cg(t)]0 g(t)du dt = c y g(t)( f (t)/[cg(t)]0 du)dt= c

y g(t)

f (t)cg(t)dt =

cc

y f (t)dt = FX (y)

Problem 10.4.6 p.329 Use accept/reject algorithm togenerate a sample from N(0, 1) distribution. fX - density ofN(0, 1), FX does not have a closed form. Y has doubleexponential (Laplace) distribution with density g(y) = 1.5e3|y |.

32 / 1

c = supyf (y)g(y) ,

f (y)g(y) =

(2pi)1/2ey2/2

1.5e|y| =1

1.5

2piey2/2+3|y |

the function is even so it is enough to consider y > 0.

y > 0: maxy ey2/2+3y ddy e

y2/2+3y = ddy ey2/2+3y (y + 3)

is equal to 0 if y = 3.

supyf (y)g(y) =

f (3)g(3) =

11.5

2pie4.5+9 = e

4.5

1.5

2pi= 23.941 = c.

X = Y if U < f (Y )cg(Y ) .

U1 U2 Y f (y)/[cg(y)] X0.22295 0.516174 0.01096 0.1148 none0.847152 0.466449 -0.02315 0.0119 none0.614370 0.001058 -2.05270 0.6385 -2.0527

x1 = 2.0527.

33 / 1

Example Another accept/reject algorithm will be used togenerate BETA(, + 1) distribution.

Let U1,U2 U[0, 1], be independent and let > 0, > 0V1 = U

1/1 ,V2 = U

1/2 . X = V1 if V1 + V2 1

Determine the distribution of X :

FX (a) = P(V1 a|V1 + V2 1) = P(V1a,V1+V21)P(V1+V21) =ND =

FV1(v) = P(V1 v) = P(U1/1 v) = P(U1 v) = v,fV1(v) = v

1 and f (v1, v2) = v11 v12 (variables V1 and

V2 are indpendent since U1 and U2 are assumed independent)

34 / 1

D = P(V1 + V2 1) = 1

0

1v10 v

11 v

12 dv2dv1

= 1

0 v11

1v10

v12 dv2 = 1

(1v1)

dv1 = 1

0 v11 (1 v1) dv1

= ()(+1)(++1) 1

0(++1)

()(+1)v11 (1 v1) dv1 = (+1)(+1)(++1)

and

N = a

0

1v10 v

11 v

12 dv2dv1 =

a0 (1 v1)v11 dv1

= ()(+1)(++1) a

0(++1)

()(+1) (1 v1)v11 dv1

= (+1)(+1)(++1) FBETA(,+1)(a).

35 / 1

Now = ND = FBETA(,+1)(a) X BETA(, + 1).

Generate 1 observation from BETA(0.738, 1.449) distribution.

X BETA(0.738, 1.449), = 0.738, = 0.449.

Generate u1,u2: 0.996484, 0.066042

v1 = 0.9964841/0.738 = 0.99523,

v2 = 0.0660421/0.449 = 0.002352.

v1 + v2 = 0.99758 1 and therefore x = v1 = 0.99523

36 / 1