Transformation of random variables

1

Lecture 1. Transformation of Random Variables

Suppose we are given a random variable X with density fX(x). We apply a function gto produce a random variable Y = g(X). We can think of X as the input to a blackbox, and Y the output. We wish to find the density or distribution function of Y . Weillustrate the technique for the example in Figure 1.1.

-12 e-x

1/2

-1

f (x)

x-axis

XY

y

X-Sqrt[y]

Sqrt[y]

Y = X2

Figure 1.1

The distribution function method finds FY directly, and then fY by differentiation.We have FY (y) = 0 for y < 0. If y ≥ 0, then P{Y ≤ y} = P{−√y ≤ x ≤ √y}.Case 1. 0 ≤ y ≤ 1 (Figure 1.2). Then

FY (y) =12√

y +∫ √y

0

12e−x dx =

12√

y +12(1− e−

√y).

1/2

-1 x-axis

-Sqrt[y]

Sqrt[y]

f (x)X

Figure 1.2

Case 2. y > 1 (Figure 1.3). Then

FY (y) =12

+∫ √y

0

12e−x dx =

12

+12(1− e−

√y).

The density of Y is 0 for y < 0 and

TARUN GEHLOT B.E (CIVIL) (HONOURS)

2

1/2

-1 x-axis

-Sqrt[y]

Sqrt[y]

f (x)X

1'

Figure 1.3

fY (y) =1

4√

y(1 + e−

√y), 0 < y < 1;

fY (y) =1

4√

ye−√

y, y > 1.

See Figure 1.4 for a sketch of fY and FY . (You can take fY (y) to be anything you like aty = 1 because {Y = 1} has probability zero.)

y

f (y)Y

1'y

F (y)Y

1

1

2y +

1

2(1 - e- y )

1

2+

1

2(1 - e- y )

'

Figure 1.4

The density function method finds fY directly, and then FY by integration; seeFigure 1.5. We have fY (y)|dy| = fX(

√y)dx + fX(−√y)dx; we write |dy| because proba-

bilities are never negative. Thus

fY (y) =fX(√

y)|dy/dx|x=

√y

+fX(−√y)|dy/dx|x=−√y

with y = x2, dy/dx = 2x, so

fY (y) =fX(√

y)2√

y+

fX(−√y)2√

y.

(Note that | − 2√

y| = 2√

y.) We have fY (y) = 0 for y < 0, and:Case 1. 0 < y < 1 (see Figure 1.2).

fY (y) =(1/2)e−

√y

2√

y+

1/22√

y=

14√

y(1 + e−√

y).

3

Case 2. y > 1 (see Figure 1.3).

fY (y) =(1/2)e−

√y

2√

y+ 0 =

14√

ye−√

y

as before.

Y

y

Xy - y

Figure 1.5

The distribution function method generalizes to situations where we have a single out-put but more than one input. For example, let X and Y be independent, each uniformlydistributed on [0, 1]. The distribution function of Z = X + Y is

FZ(z) = P{X + Y ≤ z} =∫ ∫

x+y≤z

fXY (x, y) dx dy

with fXY (x, y) = fX(x)fY (y) by independence. Now FZ(z) = 0 for z < 0 and FZ(z) = 1for z > 2 (because 0 ≤ Z ≤ 2).

Case 1. If 0 ≤ z ≤ 1, then FZ(z) is the shaded area in Figure 1.6, which is z2/2.

Case 2. If 1 ≤ z ≤ 2, then FZ(z) is the shaded area in FIgure 1.7, which is 1− [(2−z)2/2].Thus (see Figure 1.8)

fZ(z) =

z, 0 ≤ z ≤ 12− z, 1 ≤ z ≤ 20 elsewhere

.

Problems

1. Let X, Y, Z be independent, identically distributed (from now on, abbreviated iid)random variables, each with density f(x) = 6x5 for 0 ≤ x ≤ 1, and 0 elsewhere. Findthe distribution and density functions of the maximum of X, Y and Z.

2. Let X and Y be independent, each with density e−x, x ≥ 0. Find the distribution (fromnow on, an abbreviation for “Find the distribution or density function”) of Z = Y/X.

3. A discrete random variable X takes values x1, . . . , xn, each with probability 1/n. LetY = g(X) where g is an arbitrary real-valued function. Express the probability functionof Y (pY (y) = P{Y = y}) in terms of g and the xi.

4

1

1

y

x

z

z

1

1

y

x

x+y = z1 ≤ z ≤ 2

2-z

2-z

Figures 1.6 and 1.7

f (z)Z

-1

1 2' z

Figure 1.8

4. A random variable X has density f(x) = ax2 on the interval [0, b]. Find the densityof Y = X3.

5. The Cauchy density is given by f(y) = 1/[π(1 + y2)] for all real y. Show that one wayto produce this density is to take the tangent of a random variable X that is uniformlydistributed between −π/2 and π/2.

5

Lecture 2. Jacobians

We need this idea to generalize the density function method to problems where there arek inputs and k outputs, with k ≥ 2. However, if there are k inputs and j < k outputs,often extra outputs can be introduced, as we will see later in the lecture.

2.1 The Setup

Let X = X(U, V ), Y = Y (U, V ). Assume a one-to-one transformation, so that we cansolve for U and V . Thus U = U(X, Y ), V = V (X, Y ). Look at Figure 2.1. If u changesby du then x changes by (∂x/∂u) du and y changes by (∂y/∂u) du. Similarly, if v changesby dv then x changes by (∂x/∂v) dv and y changes by (∂y/∂v) dv. The small rectangle inthe u− v plane corresponds to a small parallelogram in the x− y plane (Figure 2.2), withA = (∂x/∂u, ∂y/∂u, 0) du and B = (∂x/∂v, ∂y/∂v, 0) dv. The area of the parallelogramis |A×B| and

A×B =

∣∣∣∣∣∣I J K

∂x/∂u ∂y/∂u 0∂x/∂v ∂y/∂v 0

∣∣∣∣∣∣ du dv =∣∣∣∣∂x/∂u ∂x/∂v∂y/∂u ∂y/∂v

∣∣∣∣ du dvK.

(A determinant is unchanged if we transpose the matrix, i.e., interchange rows andcolumns.)

∂x∂u du

∂y∂u du

x

y

u

v

du

dvR

Figure 2.1

AB

S

Figure 2.2

2.2 Definition and Discussion

The Jacobian of the transformation is

J =∣∣∣∣∂x/∂u ∂x/∂v∂y/∂u ∂y/∂v

∣∣∣∣ , written as∂(x, y)∂(u, v)

.

6

Thus |A × B| = |J | du dv. Now P{(X, Y ) ∈ S} = P{(U, V ) ∈ R}, in other words,fXY (x, y) times the area of S is fUV (u, v) times the area of R. Thus

fXY (x, y)|J | du dv = fUV (u, v) du dv

and

fUV (u, v) = fXY (x, y)∣∣∣∣∂(x, y)∂(u, v)

∣∣∣∣.The absolute value of the Jacobian ∂(x, y)/∂(u, v) gives a magnification factor for areain going from u− v coordinates to x− y coordinates. The magnification factor going theother way is |∂(u, v)/∂(x, y)|. But the magnification factor from u− v to u− v is 1, so

fUV (u, v) =∣∣∣∣ fXY (x, y)∂(u, v)/∂(x, y)

∣∣∣∣.In this formula, we must substitute x = x(u, v), y = y(u, v) to express the final result interms of u and v.

In three dimensions, a small rectangular box with volume du dv dw corresponds to aparallelepiped in xyz space, determined by vectors

A =(

∂x

∂u

∂y

∂u

∂z

∂u

)du, B =

(∂x

∂v

∂y

∂v

∂z

∂v

)dv, C =

(∂x

∂w

∂y

∂w

∂z

∂w

)dw.

The volume of the parallelepiped is the absolute value of the dot product of A with B×C,and the dot product can be written as a determinant with rows (or columns) A, B, C. Thisdeterminant is the Jacobian of x, y, z with respect to u, v, w [written ∂(x, y, z)/∂(u, v, w)],times du dv dw. The volume magnification from uvw to xyz space is |∂(x, y, z)/∂(u, v, w)|and we have

fUV W (u, v, w) =fXY Z(x, y, z)

|∂(u, v, w)/∂(x, y, z)|

with x = x(u, v, w), y = y(u, v, w), z = z(u, v, w).The jacobian technique extends to higher dimensions. The transformation formula is

a natural generalization of the two and three-dimensional cases:

fY1Y2···Yn(y1, . . . , yn) =fX1···Xn(x1, . . . , xn)

|∂(y1, . . . , yn)/∂(x1, . . . , xn)|

where

∂(y1, . . . , yn)∂(x1, . . . , xn)

=

∣∣∣∣∣∣∣∂y1∂x1

· · · ∂y1∂xn

...∂yn

∂x1· · · ∂yn

∂xn

∣∣∣∣∣∣∣ .

To help you remember the formula, think f(y) dy = f(x)dx.

7

2.3 A Typical Application

Let X and Y be independent, positive random variables with densities fX and fY , and letZ = XY . We find the density of Z by introducing a new random variable W , as follows:

Z = XY, W = Y

(W = X would be equally good). The transformation is one-to-one because we can solvefor X, Y in terms of Z, W by X = Z/W, Y = W . In a problem of this type, we must alwayspay attention to the range of the variables: x > 0, y > 0 is equivalent to z > 0, w > 0.Now

fZW (z, w) =fXY (x, y)

|∂(z, w)/∂(x, y)|x=z/w,y=w

with

∂(z, w)∂(x, y)

=∣∣∣∣∂z/∂x ∂z/∂y∂w/∂x ∂w/∂y

∣∣∣∣ =∣∣∣∣y x0 1

∣∣∣∣ = y.

Thus

fZW (z, w) =fX(x)fY (y)

w=

fX(z/w)fY (w)w

and we are left with the problem of finding the marginal density from a joint density:

fZ(z) =∫ ∞−∞

fZW (z, w) dw =∫ ∞

0

1w

fX(z/w)fY (w) dw.

Problems

1. The joint density of two random variables X1 and X2 is f(x1, x2) = 2e−x1e−x2 ,where 0 < x1 < x2 < ∞; f(x1, x2) = 0 elsewhere. Consider the transformationY1 = 2X1, Y2 = X2 −X1. Find the joint density of Y1 and Y2, and conclude that Y1

and Y2 are independent.2. Repeat Problem 1 with the following new data. The joint density is given by f(x1, x2) =

8x1x2, 0 < x1 < x2 < 1; f(x1, x2) = 0 elsewhere; Y1 = X1/X2, Y2 = X2.3. Repeat Problem 1 with the following new data. We now have three iid random variables

Xi, i = 1, 2, 3, each with density e−x, x > 0. The transformation equations are givenby Y1 = X1/(X1 + X2), Y2 = (X1 + X2)/(X1 + X2 + X3), Y3 = X1 + X2 + X3. Asbefore, find the joint density of the Yi and show that Y1, Y2 and Y3 are independent.

Comments on the Problem Set

In Problem 3, notice that Y1Y2Y3 = X1, Y2Y3 = X1+X2, so X2 = Y2Y3−Y1Y2Y3, X3 =(X1 + X2 + X3)− (X1 + X2) = Y3 − Y2Y3.If fXY (x, y) = g(x)h(y) for all x, y, then X and Y are independent, because

f(y|x) =fXY (x, y)

fX(x)=

g(x)h(y)g(x)

∫∞−∞ h(y) dy

8

which does not depend on x. The set of points where g(x) = 0 (equivalently fX(x) = 0)can be ignored because it has probability zero. It is important to realize that in thisargument, “for all x, y” means that x and y must be allowed to vary independently of eachother, so the set of possible x and y must be of the rectangular form a < x < b, c < y < d.(The constants a, b, c, d can be infinite.) For example, if fXY (x, y) = 2e−xe−y, 0 < y < x,and 0 elsewhere, then X and Y are not independent. Knowing x forces 0 < y < x, so theconditional distribution of Y given X = x certainly depends on x. Note that fXY (x, y)is not a function of x alone times a function of y alone. We have

fXY (x, y) = 2e−xe−yI[0 < y < x]

where the indicator I is 1 for 0 < y < x and 0 elsewhere.In Jacobian problems, pay close attention to the range of the variables. For example, inProblem 1 we have y1 = 2x1, y2 = x2 − x1, so x1 = y1/2, x2 = (y1/2) + y2. From theseequations it follows that 0 < x1 < x2 <∞ is equivalent to y1 > 0, y2 > 0.

9

Lecture 3. Moment-Generating Functions

3.1 Definition

The moment-generating function of a random variable X is defined by

M(t) = MX(t) = E[etX ]

where t is a real number. To see the reason for the terminology, note that M(t) is theexpectation of 1 + tX + t2X2/2! + t3X3/3! + · · · . If µn = E(Xn), the n-th moment of X,and we can take the expectation term by term, then

M(t) = 1 + µ1t +µ2t

2

2!+ · · ·+ µntn

n!+ · · · .

Since the coefficient of tn in the Taylor expansion is M (n)(0)/n!, where M (n) is the n-thderivative of M , we have µn = M (n)(0).

3.2 The Key Theorem

If Y =∑n

i=1 Xi where X1, . . . , Xn are independent, then MY (t) =∏n

i=1 MXi(t).Proof. First note that if X and Y are independent, then

E[g(X)h(Y )] =∫ ∞−∞

∫ ∞−∞

g(x)h(y)fXY (x, y) dx dy.

Since fXY (x, y) = fX(x)fY (y), the double integral becomes∫ ∞−∞

g(x)fX(x) dx

∫ ∞−∞

h(y)fY (y) dy = E[g(X)]E[h(Y )]

and similarly for more than two random variables. Now if Y = X1 + · · · + Xn with theXi’s independent, we have

MY (t) = E[etY ] = E[etX1 · · · etXn ] = E[etX1 ] · · ·E[etXn ] = MX1(t) · · ·MXn(t). ♣

3.3 The Main Application

Given independent random variables X1, . . . , Xn with densities f1, . . . , fn respectively,find the density of Y =

∑ni=1 Xi.

Step 1. Compute Mi(t), the moment-generating function of Xi, for each i.Step 2. Compute MY (t) =

∏ni=1 Mi(t).

Step 3. From MY (t) find fY (y).This technique is known as a transform method. Notice that the moment-generatingfunction and the density of a random variable are related by M(t) =

∫∞−∞ etxf(x) dx.

With t replaced by −s we have a Laplace transform, and with t replaced by it we have aFourier transform. The strategy works because at step 3, the moment-generating functiondetermines the density uniquely. (This is a theorem from Laplace or Fourier transformtheory.)

10

3.4 Examples

1. Bernoulli Trials. Let X be the number of successes in n trials with probability ofsuccess p on a given trial. Then X = X1 + · · ·+ Xn, where Xi = 1 if there is a success ontrial i and Xi = 0 if there is a failure on trial i. Thus

Mi(t) = E[etXi ] = P{Xi = 1}et1 + P{Xi = 0}et0 = pet + q

with p + q = 1. The moment-generating function of X is

MX(t) = (pet + q)n =n∑

k=0

(n

k

)pkqn−ketk.

This could have been derived directly:

MX(t) = E[etX ] =n∑

k=0

P{X = k}etk =n∑

k=0

(n

k

)pkqn−ketk = (pet + q)n

by the binomial theorem.2. Poisson. We have P{X = k} = e−λλk/k!, k = 0, 1, 2, . . . . Thus

M(t) =∞∑

k=0

e−λλk

k!etk = e−λ

∞∑k=0

(λet)k

k!= exp(−λ) exp(λet) = exp[λ(et − 1)].

We can compute the mean and variance from the moment-generating function:

E(X) = M ′(0) = [exp(λ(et − 1))λet]t=0 = λ.

Let h(λ, t) = exp[λ(et − 1)]. Then

E(X2) = M ′′(0) = [h(λ, t)λet + λeth(λ, t)λet]t=0 = λ + λ2

hence

VarX = E(X2)− [E(X)]2 = λ + λ2 − λ2 = λ.

3. Normal(0,1). The moment-generating function is

M(t) = E[etX ] =∫ ∞−∞

etx 1√2π

e−x2/2 dx

Now −(x2/2) + tx = −(1/2)(x2 − 2tx + t2 − t2) = −(1/2)(x− t)2 + (1/2)t2 so

M(t) = et2/2

∫ ∞−∞

1√2π

exp[−(x− t)2/2] dx.

The integral is the area under a normal density (mean t, variance 1), which is 1. Conse-quently,

M(t) = et2/2.

11

4. Normal(µ, σ2). If X is normal(µ, σ2), then Y = (X − µ)/σ is normal(0,1). This is agood application of the density function method from Lecture 1:

fY (y) =fX(x)

|dy/dx|x=µ+σy= σ

1√2πσ

e−y2/2.

We have X = µ + σY , so

MX(t) = E[etX ] = etµE[etσY ] = etµMY (tσ).

Thus

MX(t) = etµet2σ2/2.

Remember this technique, which is especially useful when Y = aX + b and the moment-generating function of X is known.

3.5 Theorem

If X is normal(µ, σ2) and Y = aX + b, then Y is normal(aµ + b, a2σ2).Proof. We compute

MY (t) = E[etY ] = E[et(aX+b)] = ebtMX(at) = ebteatµea2t2σ2/2.

Thus

MY (t) = exp[t(aµ + b)] exp(t2a2σ2/2). ♣

Here is another basic result.

3.6 Theorem

Let X1, . . . , Xn be independent, with Xi normal (µi, σ2i ). Then Y =

∑ni=1 Xi is normal

with mean µ =∑n

i=1 µi and variance σ2 =∑n

i=1 σ2i .

Proof. The moment-generating function of Y is

MY (t) =n∏

i=1

exp(tiµi + t2σ2i /2) = exp(tµ + t2σ2/2). ♣

A similar argument works for the Poisson distribution; see Problem 4.

3.7 The Gamma Distribution

First, we define the gamma function Γ(α) =∫∞0

yα−1e−y dy, α > 0. We need threeproperties:(a) Γ(α + 1) = αΓ(α), the recursion formula;(b) Γ(n + 1) = n!, n = 0, 1, 2, . . . ;

12

(c) Γ(1/2) =√

π.

To prove (a), integrate by parts: Γ(α) =∫∞0

e−yd(yα/α). Part (b) is a special case of (a).For (c) we make the change of variable y = z2/2 and compute

Γ(1/2) =∫ ∞

0

y−1/2e−y dy =∫ ∞

0

√2z−1e−z2/2z dz.

The second integral is 2√

π times half the area under the normal(0,1) density, that is,2√

π(1/2) =√

π.

The gamma density is

f(x) =1

Γ(α)βαxα−1e−x/β

where α and β are positive constants. The moment-generating function is

M(t) =∫ ∞

0

[Γ(α)βα]−1xα−1etxe−x/β dx.

Change variables via y = (−t + (1/β))x to get

∫ ∞0

[Γ(α)βα]−1

(y

−t + (1/β)

)α−1

e−y dy

−t + (1/β)

which reduces to

1βα

(β

1− βt

)α

= (1− βt)−α.

In this argument, t must be less than 1/β so that the integrals will be finite.

Since M(0) =∫∞−∞ f(x) dx =

∫∞0

f(x) dx in this case, with f ≥ 0, M(0) = 1 implies thatwe have a legal probability density. As before, moments can be calculated efficiently fromthe moment-generating function:

E(X) = M ′(0) = −α(1− βt)−α−1(−β)|t=0 = αβ;

E(X2) = M ′′(0) = −α(−α− 1)(1− βt)−α−2(−β)2|t=0 = α(α + 1)β2.

Thus

VarX = E(X2)− [E(X)]2 = αβ2.

3.8 Special Cases

The exponential density is a gamma density with α = 1 : f(x) = (1/β)e−x/β , x ≥ 0, withE(X) = β, E(X2) = 2β2, VarX = β2.

13

A random variable X has the chi square density with r degrees of freedom (X = χ2(r)for short, where r is a positive integer) if its density is gamma with α = r/2 and β = 2.Thus

f(x) =1

Γ(r/2)2r/2x(r/2)−1e−x/2, x ≥ 0

and

M(t) =1

(1− 2t)r/2, t < 1/2.

Therefore E[χ2(r)] = αβ = r, Var[χ2(r)] = αβ2 = 2r.

3.9 Lemma

If X is normal(0,1) then X2 is χ2(1).Proof. We compute the moment-generating function of X2 directly:

MX2(t) = E[etX2] =

∫ ∞−∞

etx2 1√2π

e−x2/2 dx.

Let y =√

1− 2tx; the integral becomes∫ ∞−∞

1√2π

e−y2/2 dy√1− 2t

= (1− 2t)−1/2

which is χ2(1). ♣

3.10 Theorem

If X1, . . . , Xn are independent, each normal (0,1), then Y =∑n

i=1 X2i is χ2(n).

Proof. By (3.9), each X2i is χ2(1) with moment-generating function (1 − 2t)−1/2. Thus

MY (t) = (1− 2t)−n/2 for t < 1/2, which is χ2(n). ♣

3.11 Another Method

Another way to find the density of Z = X + Y where X and Y are independent randomvariables is by the convolution formula

fZ(z) =∫ ∞−∞

fX(x)fY (z − x) dx =∫ ∞−∞

fY (y)fX(z − y) dy.

To see this intuitively, reason as follows. The probability that Z lies near z (between zand z + dz) is fZ(z) dz. Let us compute this in terms of X and Y . The probability thatX lies near x is fX(x) dx. Given that X lies near x, Z will lie near z if and only if Y liesnear z − x, in other words, z − x ≤ Y ≤ z − x + dz. By independence of X and Y , thisprobability is fY (z−x) dz. Thus fZ(z) is a sum of terms of the form fX(x) dx fY (z−x) dz.Cancel the dz’s and replace the sum by an integral to get the result. A formal proof canbe given using Jacobians.

14

3.12 The Poisson Process

This process occurs in many physical situations, and provides an application of the gammadistribution. For example, particles can arrive at a counting device, customers at a servingcounter, airplanes at an airport, or phone calls at a telephone exchange. Divide the timeinterval [0, t] into a large number n of small subintervals of length dt, so that n dt = t. IfIi, i = 1, . . . , n, is one of the small subintervals, we make the following assumptions:(1) The probability of exactly one arrival in Ii is λ dt, where λ is a constant.(2) The probability of no arrivals in Ii is 1− λ dt.(3) The probability of more than one arrival in Ii is zero.(4) If Ai is the event of an arrival in Ii, then the Ai, i = 1, . . . , n are independent.

As a consequence of these assumptions, we have n = t/dt Bernoulli trials with prob-ability of success p = λdt on a given trial. As dt → 0 we have n → ∞ and p → 0, withnp = λt. We conclude that the number N [0, t] of arrivals in [0, t] is Poisson (λt):

P{N [0, t] = k} = e−λt(λt)k/k!, k = 0, 1, 2, . . . .

Since E(N [0, t]) = λt, we may interpret λ as the average number of arrivals per unit time.Now let W1 be the waiting time for the first arrival. Then

P{W1 > t} = P{no arrival in [0,t]} = P{N [0, t] = 0} = e−λt, t ≥ 0.

Thus FW1(t) = 1− e−λt and fW1(t) = λe−λt, t ≥ 0. From the formulas for the mean andvariance of an exponential random variable we have E(W1) = 1/λ and VarW1 = 1/λ2.

Let Wk be the (total) waiting time for the k-th arrival. Then Wk is the waiting timefor the first arrival plus the time after the first up to the second arrival plus · · · plus thetime after arrival k − 1 up to the k-th arrival. Thus Wk is the sum of k independentexponential random variables, and

MWk(t) =

(1

1− (t/λ)

)k

so Wk is gamma with α = k, β = 1/λ. Therefore

fWk(t) =

1(k − 1)!

λktk−1e−λt, t ≥ 0.

Problems

1. Let X1 and X2 be independent, and assume that X1 is χ2(r1) and Y = X1 + X2 isχ2(r), where r > r1. Show that X2 is χ2(r2), where r2 = r − r1.

2. Let X1 and X2 be independent, with Xi gamma with parameters αi and βi, i = 1, 2.If c1 and c2 are positive constants, find convenient sufficient conditions under whichc1X1 + c2X2 will also have the gamma distribution.

3. If X1, . . . , Xn are independent random variables with moment-generating functionsM1, . . . , Mn, and c1, . . . , cn are constants, express the moment-generating function Mof c1X1 + · · ·+ cnXn in terms of the Mi.

15

4. If X1, . . . , Xn are independent, with Xi Poisson(λi), i = 1, . . . , n, show that the sumY =

∑ni=1 Xi has the Poisson distribution with parameter λ =

∑ni=1 λi.

5. An unbiased coin is tossed independently n1 times and then again tossed independentlyn2 times. Let X1 be the number of heads in the first experiment, and X2 the numberof tails in the second experiment. Without using moment-generating functions, in factwithout any calculation at all, find the distribution of X1 + X2.

16

Lecture 4. Sampling From a Normal Population

4.1 Definitions and Comments

Let X1, . . . , Xn be iid. The sample mean of the Xi is

X =1n

n∑i=1

Xi

and the sample variance is

S2 =1n

n∑i=1

(Xi −X)2.

If the Xi have mean µ and variance σ2, then

E(X) =1n

n∑i=1

E(Xi) =1n

nµ = µ

and

VarX =1n2

n∑i=1

VarXi =nσ2

n2=

σ2

n→ 0 as n→∞.

Thus X is a good estimate of µ. (For large n, the variance of X is small, so X isconcentrated near its mean.) The sample variance is an average squared deviation fromthe sample mean, but it is a biased estimate of the true variance σ2:

E[(Xi −X)2] = E[(Xi − µ)− (X − µ)]2 = VarXi + VarX − 2E[(Xi − µ)(X − µ)].

Notice the centralizing technique. We subtract and add back the mean of Xi, which willmake the cross terms easier to handle when squaring. The above expression simplifies to

σ2 +σ2

n− 2E[(Xi − µ)

1n

n∑j=1

(Xj − µ)] = σ2 +σ2

n− 2

nE[(Xi − µ)2].

Thus

E[(Xi −X)2] = σ2(1 +1n− 2

n) =

n− 1n

σ2.

Consequently, E(S2) = (n− 1)σ2/n, not σ2. Some books define the sample variance as

1n− 1

n∑i=1

(Xi −X)2 =n

n− 1S2

where S2 is our sample variance. This adjusted estimate of the true variance is unbiased(its expectation is σ2), but biased does not mean bad. If we measure performance byasking for a small mean square error, the biased estimate is better in the normal case, aswe will see at the end of the lecture.

17

4.2 The Normal Case

We now assume that the Xi are normally distributed, and find the distribution of S2. Lety1 = x = (x1+ · · ·+xn)/n, y2 = x2−x, . . . , yn = xn−x. Then y1+y2 = x2, y1+y3 =x3, . . . , y1 +yn = xn. Add these equations to get (n−1)y1 +y2 + · · ·+yn = x2 + · · ·+xn,or

ny1 + (y2 + · · ·+ yn) = (x2 + · · ·+ xn) + y1 (1)

But ny1 = nx = x1 + · · ·+xn, so by cancelling x2, . . . , xn in (1), x1 +(y2 + · · ·+yn) = y1.Thus we can solve for the x’s in terms of the y’s:

x1 = y1 − y2 − · · · − yn

x2 = y1 + y2

x3 = y1 + y3 (2)...

xn = y1 + yn

The Jacobian of the transformation is

dn =∂(x1, . . . , xn)∂(y1, . . . , yn)

=

∣∣∣∣∣∣∣∣∣∣∣

1 −1 −1 · · · −11 1 0 · · · 01 0 1 · · · 0...1 0 0 · · · 1

∣∣∣∣∣∣∣∣∣∣∣To see the pattern, look at the 4 by 4 case and expand via the last row:∣∣∣∣∣∣∣∣

1 −1 −1 −11 1 0 01 0 1 01 0 0 1

∣∣∣∣∣∣∣∣= (−1)

∣∣∣∣∣∣−1 −1 −11 0 00 1 0

∣∣∣∣∣∣ +

∣∣∣∣∣∣1 −1 −11 1 01 0 1

∣∣∣∣∣∣so d4 = 1+d3. In general, dn = 1+dn−1, and since d2 = 2 by inspection, we have dn = nfor all n ≥ 2. Now

n∑i=1

(xi − µ)2 =∑

(xi − x + x− µ)2 =∑

(xi − x)2 + n(x− µ)2 (3)

because∑

(xi−x) = 0. By (2), x1−x = x1−y1 = −y2−· · ·−yn and xi−x = xi−y1 = yi

for i = 2, . . . , n. (Remember that y1 = x.) Thus

n∑i=1

(xi − x)2 = (−y2 − · · · − yn)2 +n∑

i=2

y2i (4)

Now

fY1···Yn(y1, . . . , yn) = nfX1···Xn(x1, . . . , xn).

18

By (3) and (4), the right side becomes, in terms of the yi’s,

n

(1√2πσ

)n

exp[

12σ2

((−

n∑i=2

yi

)2 −n∑

i=2

y2i − n(y1 − µ)2

)].

The joint density of Y1, . . . , Yn is a function of y1 times a function of (y2, . . . , yn), soY1 and (Y2, . . . , Yn) are independent. Since X = Y1 and [by (4)] S2 is a function of(Y2, . . . , Yn),

X and S2 are independent

Dividing Equation (3) by σ2 we have

n∑i=1

(Xi − µ

σ

)2

=nS2

σ2+

(X − µ

σ/√

n

)2

.

But (Xi − µ)/σ is normal (0,1) and

X − µ

σ/√

nis normal (0,1)

so χ2(n) = (nS2/σ2) + χ2(1) with the two random variables on the right independent. IfM(t) is the moment-generating function of nS2/σ2, then (1−2t)−n/2 = M(t)(1−2t)−1/2.Therefore M(t) = (1− 2t)−(n−1)/2, i.e.,

nS2

σ2is χ2(n− 1)

The random variable

T =X − µ

S/√

n− 1

is useful in situations where µ is to be estimated but the true variance σ2 is unknown. Itturns out that T has a “T distribution”, which we study in the next lecture.

4.3 Performance of Various Estimates

Let S2 be the sample variance of iid normal (µ, σ2) random variables X1, . . . , Xn. Wewill look at estimates of σ2 of the form cS2, where c is a constant. Once again employingthe centralizing technique, we write

E[(cS2 − σ2)2] = E[(cS2 − cE(S2) + cE(S2)− σ2)2]

which simplifies to

c2 VarS2 + (cE(S2)− σ2)2.

19

Since nS2/σ2 is χ2(n−1), which has variance 2(n−1), we have n2(VarS2)/σ4 = 2(n−1).Also nE(S2)/σ2 is the mean of χ2(n − 1), which is n − 1. (Or we can recall from (4.1)that E(S2) = (n− 1)σ2/n.) Thus the mean square error is

c22σ4(n− 1)n2

+(c(n− 1)

nσ2 − σ2

)2.

We can drop the σ4 and use n2 as a common denominator, which can also be dropped.We are then trying to minimize

c22(n− 1) + c2(n− 1)2 − 2c(n− 1)n + n2.

Differentiate with respect to c and set the result equal to zero:

4c(n− 1) + 2c(n− 1)2 − 2(n− 1)n = 0.

Dividing by 2(n − 1), we have 2c + c(n − 1) − n = 0, so c = n/(n + 1). Thus the bestestimate of the form cS2 is

1n + 1

n∑i=1

(Xi −X)2.

If we use S2 then c = 1. If we us the unbiased version then c = n/(n − 1). Since[n/(n + 1)] < 1 < [n/(n − 1)] and a quadratic function decreases as we move towardits minimum, w see that the biased estimate S2 is better than the unbiased estimatenS2/(n − 1), but neither is optimal under the minimum mean square error criterion.Explicitly, when c = n/(n− 1) we get a mean square error of 2σ4/(n− 1) and when c = 1we get

σ4

n2

[2(n− 1) + (n− 1− n)2

]=

(2n− 1)σ4

n2

which is always smaller, because [(2n − 1)/n2] < 2/(n − 1) iff 2n2 > 2n2 − 3n + 1 iff3n > 1, which is true for every positive integer n.

For large n all these estimates are good and the difference between their performanceis small.

Problems

1. Let X1, . . . , Xn be iid, each normal (µ, σ2), and let X be the sample mean. If c is aconstant, we wish to make n large enough so that P{µ− c < X < µ + c} ≥ .954. Findthe minimum value of n in terms of σ2 and c. (It is independent of µ.)

2. Let X1, . . . , Xn1 , Y1, . . . Yn2 be independent random variables, with the Xi normal(µ1, σ

21) and the Yi normal (µ2, σ

22). If X is the sample mean of the Xi and Y is the

sample mean of the Yi, explain how to compute the probability that X > Y .3. Let X1, . . . , Xn be iid, each normal (µ, σ2), and let S2 be the sample variance. Explain

how to compute P{a < S2 < b}.4. Let S2 be the sample variance of iid normal (µ, σ2) random variables Xi, i = 1 . . . , n.

Calculate the moment-generating function of S2 and from this, deduce that S2 has agamma distribution.

20

Lecture 5. The T and F Distributions


The Tdistribution is defined as follows. Let X1 and X2 be independent, with X1 normal(0,1) and X2 chi-square with r degrees of freedom. The random variable Y1 =

√rX1/

√X2

has the T distribution with r degrees of freedom.To find the density of Y1, let Y2 = X2. Then X1 = Y1

√Y 2/√

r and X2 = Y2. Thetransformation is one-to-one with −∞ < X1 < ∞, X2 > 0 ⇐⇒ −∞ < Y1 < ∞, Y2 > 0.The Jacobian is given by

∂(x1, x2)∂(y1, y2)

=∣∣∣∣√

y2/r y1/(2√

ry2)0 1

∣∣∣∣ =√

y2/r.

Thus fY1Y2(y1, y2) = fX1X2(x1, x2)√

y2/r, which upon substitution for x1 and x2 becomes

1√2π

exp[−y21y2/2r]

1Γ(r/2)2r/2

y(r/2)−12 e−y2/2

√y2/r.

The density of Y1 is

1√2πΓ(r/2)2r/2

∫ ∞0

y[(r+1)/2]−12 exp[−(1 + (y2

1/r))y2/2] dy2/√

r.

With z = (1 + (y21/r))y2/2 and the observation that all factors of 2 cancel, this becomes

(with y1 replaced by t)

Γ((r + 1)/2)√rπΓ(r/2)

1(1 + (t2/r))(r+1)/2

,−∞ < t <∞,

the Tdensity with r degrees of freedom.In sampling from a normal population, (X − µ)/(σ/

√n) is normal (0,1), and nS2/σ2

is χ2(n− 1). Thus

√n− 1

(X − µ)σ/√

ndivided by

√nS/σ is T (n− 1).

Since σ and√

n disappear after cancellation, we have

X − µ

S/√

n− 1is T (n− 1)

Advocates of defining the sample variance with n− 1 in the denominator point out thatone can simply replace σ by S in (X − µ)/(σ/

√n) to get the T statistic.

Intuitively, we expect that for large n, (X − µ)/(S/√

n− 1) has approximately thesame distribution as (X−µ)/(σ/

√n), i.e., normal (0,1). This is in fact true, as suggested

by the following computation:(1 +

t2

r

)(r+1)/2

=

√(1 +

t2

r

)r (1 +

t2

r

)1/2

→√

et2 × 1 = et2/2

as r →∞.

21

5.2 A Preliminary Calculation

Before turning to the F distribution, we calculate the density of U = X1/X2 where X1 andX2 are independent, positive random variables. Let Y = X2, so that X1 = UY, X2 = Y(X1, X2, U, Y ) are all greater than zero). The Jacobian is

∂(x1, x2)∂(u, y)

=∣∣∣∣y u0 1

∣∣∣∣ = y.

Thus fUY (u, y) = fX1X2(x1, x2)y = fX1(uy)fX2(y), and the density of U is

h(u) =∫ ∞

0

yfX1(uy)fX2(y) dy.

Now we take X1 to be χ2(m), and X2 to be χ2(n). The density of X1/X2 is

h(u) =1

2(m+n)/2Γ(m/2)Γ(n/2)u(m/2)−1

∫ ∞0

y[(m+n)/2]−1e−y(1+u)/2 dy.

The substitution z = y(1 + u)/2 gives

h(u) =1

2(m+n)/2Γ(m/2)Γ(n/2)u(m/2)−1

∫ ∞0

z[(m+n)/2]−1

[(1 + u)/2][(m+n)/2]−1e−z 2

1 + udz.

We abbreviate Γ(a)Γ(b)/Γ(a + b) by β(a, b). (We will have much more to say about thiswhen we discuss the beta distribution later in the lecture.) The above formula simplifiesto

h(u) =1

β(m/2, n/2)u(m/2)−1

(1 + u)(m+n)/2, u ≥ 0.


The F density is defined as follows. Let X1 and X2 be independent, with X1 = χ2(m)and X2 = χ2(n). With U as in (5.2), let

W =X1/m

X2/n=

n

mU

so that

fW (w) = fU (u)∣∣∣∣ du

dw

∣∣∣∣ =m

nfU

(m

nw

).

Thus W has density

(m/n)m/2

β(m/2, n/2)w(m/2)−1

[1 + (m/n)w](m+n)/2, w ≥ 0,

the F density with m and n degrees of freedom.

22

5.4 Definitions and Calculations

The beta function is given by

β(a, b) =∫ 1

0

xa−1(1− x)b−1 dx, a, b > 0.

We will show that

β(a, b) =Γ(a)Γ(b)Γ(a + b)

which is consistent with our use of β(a, b) as an abbreviation in (5.2). We make the changeof variable t = x2 to get

Γ(a) =∫ ∞

0

ta−1e−t dt = 2∫ ∞

0

x2a−1e−x2dx.

We now use the familiar trick of writing Γ(a)Γ(b) as a double integral and switching topolar coordinates. Thus

Γ(a)Γ(b) = 4∫ ∞

0

∫ ∞0

x2a−1y2b−1e−(x2+y2) dx dy

= 4∫ π/2

0

dθ

∫ ∞0

(cos θ)2a−1(sin θ)2b−1e−r2r2a+2b−1 dr.

The change of variable u = r2 yields

∫ ∞0

r2a+2b−1e−r2dr = (1/2)

∫ ∞0

ua+b−1e−u du = Γ(a + b)/2.

Thus

Γ(a)Γ(b)2Γ(a + b)

=∫ π/2

0

(cos θ)2a−1(sin θ)2b−1 dθ.

Let z = cos2 θ, 1 − z = sin2 θ, dz = −2 cos θ sin θ dθ = −2z1/2(1 − z)1/2 dz. The aboveintegral becomes

−12

∫ 0

1

za−1(1− z)b−1 dz =12

∫ 1

0

za−1(1− z)b−1 dz =12β(a, b)

as claimed. The beta density is

f(x) =1

β(a, b)xa−1(1− x)b−1, 0 ≤ x ≤ 1 (a, b > 0).

23

Problems

1. Let X have the beta distribution with parameters a and b. Find the mean and varianceof X.

2. Let T have the T distribution with 15 degrees of freedom. Find the value of c whichmakes P{−c ≤ T ≤ c} = .95.

3. Let W have the F distribution with m and n degrees of freedom (abbreviated W =F (m, n)). Find the distribution of 1/W .

4. A typical table of the F distribution gives values of P{W ≤ c} for c = .9, .95, .975 and.99. Explain how to find P{W ≤ c} for c = .1, .05, .025 and .01. (Use the result ofProblem 3.)

5. Let X have the T distribution with n degrees of freedom (abbreviated X = T (n)).Show that T 2(n) = F (1, n), in other words, T 2 has an F distribution with 1 and ndegrees of freedom.

6. If X has the exponential density e−x, x ≥ 0, show that 2X is χ2(2). Deduce that thequotient of two exponential random variables is F (2, 2).

Education

Transformation of random variables