Chapter 1 Random Variables and their Distributionsbb/SM_Notes_1.1-1.6.pdf · 2010-10-09 · Chapter 1 Random Variables and their Distributions 1.1 Random Variables Deﬁnition 1.1

Chapter 1

Random Variables and theirDistributions

1.1 Random Variables

Definition 1.1. A random variable X is a function that assigns oneand only one numerical value to each outcome of an experiment,that is

X : Ω → X ⊆ R,

whereΩ is a sample space,X is a set of all possible values ofX.

We will denote rvs by capital letters:X, Y or Z and their values bysmall letters:x, y or z respectively.

There are two types of rvs: discrete and continuous. Random vari-ables that take a countable number of values are calleddiscrete. Ran-dom variables that take values from an interval of real numbers arecalledcontinuous.

3

4 CHAPTER 1. RANDOM VARIABLES AND THEIR DISTRIBUTIONS

Example1.1. In an efficacy preclinical trial a drug candidate is testedon mice. If the observed response can either be “efficacious”or“non-efficacious”, then we have

Ω = ω1, ω2,

whereω1 is an outcome of the experiment meaning efficacious re-sponse to the drug,ω2 is an outcome meaning non-efficacious re-sponse. Assume thatP (ω1) = p andP (ω2) = q = 1− p.

It is natural to define a random variableX as follows:

X(ω1) = 1, X(ω2) = 0.

Then

PX(X = 1) = P(

ωj ∈ Ω : X(ωj) = 1)

= P (ω1) = p

and

PX(X = 0) = P(

ωj ∈ Ω : X(ωj) = 0)

= P (ω2) = q.

HereX = x1, x2 = 0, 1.

Here,PX is aninducedprobability function onX , defined in termsof the probability function onΩ.

Example1.2. In the same experiment we might observe a continuousoutcome, for example, time to a specific response. Then

Ω = ω ∈ (0,∞).

Many random variables could be of interest in this case. For exam-ple, it could be a continuous rv, such as

X : Ω → X ,

1.1. RANDOM VARIABLES 5

whereX(ω) = lnω.

HereX = (−∞,∞). Then, for example,

P (X ∈ [−1, 1]) = P (ω ∈ Ω : X(ω) ∈ [−1, 1]) = P (ω ∈ [e−1, e]).

On the other hand, if we are interested in just two events suchas, forexample: the time is less than a pre-specified value, sayt⋆, or thatit is more than this value, than we categorize the outcome andthesample space is

Ω = ω1, ω2,whereω1 means that the observed time to a reaction was shorter thanor equal tot⋆, ω2 means that the time was longer thant⋆. Then wecan define a discrete random variable as in Example 1.1.

Definition 1.2. If g : R → R is a monotone, continuous function,X : Ω → X ⊆ R is a random variable andY = g(X), thenY is arandom variable such that

Y : Ω → Y ⊆ R andY (ω) = g(X(ω))

for all ω ∈ Ω.

Example1.3. In an experiment we toss two fair coins. Then thesample space is the set of pairs, i.e.,Ω = tt, th, ht, hh, wheretdenotestail andh denoteshead. Define a rvX as the number ofheads, that isX : Ω → X , whereX = 0, 1, 2, i.e.:

X(tt) = 0,

X(th) = X(ht) = 1,

X(hh) = 2.

ThenY = 2 − X denotes the number of tails obtained in the ex-periment andY : Ω → Y , whereY = 0, 1, 2. That is,Y (ω) =

2−X(ω), for exampleY (tt) = 2−X(tt) = 2.


1.2 Distribution Functions

Definition 1.3. The probability of the event(X ≤ x) expressed as afunction ofx ∈ R and denoted by

FX(x) = PX(X ≤ x)

is called thecumulative distribution function (cdf) of the rvX.

Example1.4. The cdf of the rv defined in Example 1.1 can be writtenas

FX(x) =

0, for x ∈ (−∞, 0);q, for x ∈ [0, 1);

q + p = 1, for x ∈ [1,∞).

Properties of cumulative distribution functions are givenin the fol-lowing theorem.

Theorem 1.1.The functionF (x) is a cdf iff the following conditionshold:

(i) The function is nondecreasing, that is, ifx1 < x2 thenFX(x1) ≤FX(x2);

(ii) limx→−∞ FX(x) = 0;

(iii) limx→∞ FX(x) = 1;

(iv) F (x) is right-continuous.

Example1.5. We will show that

FX(x) =1

1 + e−x

1.3. DENSITY AND MASS FUNCTIONS 7

is the cdf of a rvX. It is enough to show that the function meetsthe requirements of Theorem 1.1. To check that the function in non-decreasing we may check the sign of the derivative ofF (x). It is

F ′(x) =e−x

(1 + e−x)2> 0,

showing thatF (x) is increasing.

limx→−∞

FX(x) = 0 since limx→−∞

e−x = ∞;

limx→∞

FX(x) = 1 since limx→∞

e−x = 0.

Furthermore, it is a continuous function, not only right-continuous.

Now we can define discrete and continuous rvs more formally.

Definition 1.4. A random variableX is continuous ifFX(x) is acontinuous function ofx. A random variableX is discrete ifFX(x)is a step function ofx.

1.3 Density and Mass Functions

1.3.1 Discrete Random Variables

Values of a discrete rv are elements of a countable setx1, x2, . . ..We associate a numberpX(xi) = PX(X = xi) with each valuexi,i = 1, 2, . . ., such that:

1. pX(xi) ≥ 0 for all i;

2.∑∞

i=1 pX(xi) = 1.


Note thatFX(xi) = PX(X ≤ xi) =

∑

x≤xi

pX(x), (1.1)

pX(xi) = FX(xi)− FX(xi−1). (1.2)

The functionpX is called theprobability mass function(pmf) of therandom variableX, and the collection of pairs

(xi, pX(xi)), i = 1, 2, . . . (1.3)

is called theprobability distributionof X. The distribution is usu-ally presented in either tabular, graphical or mathematical form.

Example1.6. Consider Example 1.1, but now, we haven mice andwe observe efficacy or no efficacy for each mouse independently.We are interested in the number of mice which respond positivelyto the applied drug candidate. IfXi is a random variable as definedin Example 1.1 for each mouse, and we may assume that the prob-ability of a positive response is the same for all mice, then we maycreate a new random variableX as the sum of allXi, that is,

X =n∑

i=1

Xi.

X denotesx successes inn independent trials and it has a binomialdistribution, which we denote by

X ∼ Bin(n, p),

wherep is the probability of success. The pmf of a binomially dis-tributed rvX with parametersn andp is

P (X = x) =

(

n

x

)

px(1− p)n−x, x = 0, 1, 2, . . . , n,

wheren is a positive integer and0 ≤ p ≤ 1.

1.3. DENSITY AND MASS FUNCTIONS 9

For example, forn = 8 and the probability of successp = 0.4 weobtain the following values of the pmf and the cdf:

k 0 1 2 3 4 5 6 7 8P (X = k) 0.0168 0.0896 0.2090 0.2787 0.2322 0.1239 0.0413 0.0079 0.0007P (X ≤ k) 0.0168 0.1064 0.3154 0.5941 0.8263 0.9502 0.9915 0.9993 1

Here, for example, we could say that the probability of 5 micere-sponding positively to the drug candidate isP (X = 5) = 0.1239,but the probability that at least 5 mice will respond positively isP (X ≤ 5) = 0.8263.

1.3.2 Continuous Random Variables

Values of a continuous rv are elements of an uncountable set,forexample a real interval. A cdf of a continuous rv is a continuous,nondecreasing, differentiable function. We define theprobabilitydensity function(pdf) of a continuous rv as:

fX(x) =d

dxFX(x). (1.4)

Hence,

FX(x) =

∫ x

−∞fX(t)dt. (1.5)

Similarly to the properties of the probability mass function of a dis-crete rv we have the following properties of the probabilitydensityfunction:

1. fX(x) ≥ 0 for all x ∈ X ;

2.∫

X fX(x)dx = 1.

Probability of an event thatX ∈ (−∞, a), is expressed as an integral

PX(−∞ < X < a) =

∫ a

−∞fX(x)dx = FX(a) (1.6)


or for a bounded interval(b, c) as

PX(b < X < c) =

∫ c

b

fX(x)dx = FX(c)− FX(b). (1.7)

An interesting difference from a discrete rv is that for aδ > 0

PX(X = x) = limδ→0(FX(x+ δ)− FX(x)) = 0.

Example1.7. LetX be a random variable with the pdf given by

f(x) = c sinxI[0,π].

Find the value ofc.

Here we use the notation of the indicator functionIX (x) whosemeaning is as follows:

IX (x) =

1, if x ∈ X ;

0, othewise.

For a function to be pdf of a random variable, it must meet two con-ditions: it must be non-negative and integrate to 1 (over thewholeregion). Hencec must be a positive number and

∫ ∞

−∞f(x)dx =

∫ π

0

c sin xdx = 1.

We have∫ π

0

c sin xdx = −c cosx|π0 = −c(−1− 1) = 2c.

Hencec = 1/2.

1.4. EXPECTED VALUES 11

1.4 Expected Values

Definition 1.5. The expected value of a functiong(X) is defined by

E(g(X)) =

∫ ∞

−∞g(x)f(x)dx, for a continuous r.v.,

∞∑

j=0

g(xj)p(xj) for a discrete r.v.,

andg is any function such thatE |g(X)| < ∞.

Two important special cases ofg are:

• g(X) = X, which gives themeanE(X) also denoted byEX,

• g(X) = (X − EX)2, which gives thevariancevar(X) =E(X − EX)2.

The following relation is very useful while calculating thevari-ance

E(X − EX)2 = EX2 − (EX)2. (1.8)

Example1.8. LetX be a random variable such that

f(x) =

1

2sin x, for x ∈ [0, π],

0 otherwise.

Then the expectation and variance are following

• Expectation

EX =1

2

∫ π

0

x sinxdx =π

2.


• Variance

var(X) = EX2 − (EX)2

=1

2

∫ π

0

x2 sinxdx−(π

2

)2

=π2

4.

The following useful properties of the expectation follow from prop-erties of integration (summation).

Theorem 1.2.Let X be a random variable and leta, b and c beconstants. Then for any functionsg(x) andh(x) whose expectationsexist we have:

(a) E[ag(X) + bh(X) + c] = aE[g(X)] + bE[h(X)] + c;

(b) If g(x) ≥ h(x) for all x, thenE(g(X)) ≥ E(h(X));

(c) If g(x) ≥ 0 for all x, thenE(g(X)) ≥ 0;

(d) If a ≥ g(x) ≥ b for all x, thena ≥ E(g(X)) ≥ b.

The variance of a random variable together with the mean are themost important parameters used in the theory of statistics.The fol-lowing theorem is a result of the properties of the expectation func-tion.

Theorem 1.3.If X is a random variable with a finite variance, thenfor any constantsa andb,

var(aX + b) = a2 varX.

1.5. FAMILIES OF DISTRIBUTIONS 13

Proof

var(aX + b) = E(aX + b)2 − E(aX + b)2= E(a2X2 + 2abX + b2)− (a2EX2 + 2abEX + b2)

= a2 EX2 − a2EX2 = a2 var(X)

1.5 Families of Distributions

Distributions of random variables often depend on some parameters,like p, λ, µ, σ or other. These values are usually unknown and theirestimation is one of the most important problems in statistical ana-lyzes. These parameters determine some characteristics ofthe shapeof the pdf/pmf of the random variable. It can be location, spread,skewness etc.

1.5.1 Some families of discrete distributions

1. UniformU(n) (equal mass at each outcome): The support setand the pmf are, respectively,X = x1, x2, . . . , xn and

P (X = xi) =1

n, xi ∈ X ,

wheren is a positive integer. In the special case ofX =

1, 2, . . . , n we have

E(X) =n+ 1

2, var(X) =

(n+ 1)(n− 1)

12.

Examples:


(a) X ≡ first digit in a randomly selected sequence of lengthn of 5 digits;

(b) X ≡ randomly selected student in a class of 15 students.

2. Bernoulli(p) (only two possible outcomes, usually called “suc-cess” and “failure”): The support set and the pmf are, respec-tively, X = 0, 1 and

P (X = x) = px(1− p)1−x, x ∈ X ,

wherep ∈ [0, 1] is the probability of success.

E(X) = p, var(X) = p(1− p).

Examples:

(a) X ≡ an outcome of tossing a coin;

(b) X ≡ detection of a fault in a tested semiconductor chip;

(c) X ≡ guessed answer in a multiple choice question.

3. Bin(n, p) (number of success inn independent trials): The sup-port set and the pmf are, respectively,X = 0, 1, 2, . . . n and

P (X = x) =

(

nx

)

px(1− p)n−x, x ∈ X ,

wherep ∈ [0, 1] is the probability of success.

E(X) = np, var(X) = np(1− p).

Examples:

(a) X ≡ number of heads in several tosses of a coin;

(b) X ≡ number of semiconductor chips in several faulty chipsin which a test finds a defect;

(c) X ≡ number of correctly guessed answers in a multiplechoice test ofn questions.


4. Geom(p) (the number of independent Bernoulli trials until first“success”): The support set and the pmf are, respectively,X =1, 2, . . . and

P (X = x) = p(1− p)x−1 = pqx−1, x ∈ X ,

wherep ∈ [0, 1] is the probability of success,q = 1− p.

E(X) =1

p, var(X) =

1− p

p2.

Here we will prove the formula for the expectation. By thedefinition of the expectation we have

E(X) =∞∑

x=1

xpqx−1 = p∞∑

x=1

xqx−1.

Note that forx ≥ 1 the following equality holds

d

dq(qx) = xqx−1.

Hence,

E(X) =p

∞∑

x=1

xqx−1 = p

∞∑

x=1

d

dq(qx)

= pd

dq

( ∞∑

x=1

qx

)

.

The latter is the sum of a geometric sequence and is equal toq/(1− q). Therefore,

E(X) = pd

dq

(

q

1− q

)

= p

(

1

(1− q)2

)

=1

p.

Similar method can be used to show that thevar(X) = q/p2

(second derivative with respect toq of qx can be applied forthis).

Examples include:


(a) X ≡ number of bits transmitted until the first error;

(b) X ≡ number of analyzed samples of air before a raremolecule is detected.

5. Poisson(λ) (a number of outcomes in a period of time or in apart of a space): The support set and the pmf are, respectively,X = 0, 1, . . . and

P (X = x) =λx

x!e−λ,

whereλ > 0.E(X) = λ, var(X) = λ.

We will show thatE(X) = λ. The result for the variance canbe shown in a similar way. By definition of the expectation wehave

E(X) =∞∑

x=0

xλxe−λ

x!

=

∞∑

x=1

xλxe−λ

x!=

∞∑

x=1

λxe−λ

(x− 1)!.

Let z = x− 1. Then we obtain

E(X) =∞∑

z=0

λz+1e−λ

z!= λ

∞∑

z=0

λze−λ

z!= λ

asλze−λ

z!is the pdf ofZ.

Examples:

(a) count blood cells within a square of a haemocytometerslide;

(b) number of caterpillars on a leaf;

(c) number of plants of a rear variety in a square meter of ameadow;


V1

0.00

0.05

0.10

0.15

0.20

De

nsi

ty

a bYour text

cx

0.0

0.2

0.4

0.6

0.8

1.0

cdf

F(x)

Figure 1.1: Distribution Function and Cumulative Distribution Function forN(4.5, 2)

(d) number of tree seedlings in a square meter around a bigtree;

(e) number of phone calls to a computer service within a minute.

1.5.2 Some families of continuous distributions

The most common continuous distribution is calledNormaland de-noted byN (µ, σ2). The density function is given by

fX(x) =1

σ√2π

e−(x−µ)2

2σ2 , x ∈ R.

There are two parameters which tell us about the locationE(X) = µ

and the the spreadvar(X) = σ2 of the density curve.

There is an extensive theory of statistical analysis for data which arerealizations of normally distributed random variables. This distribu-tion is most common in applications, but sometimes it is not feasibleto assume that what we observe can indeed be a sample from suchapopulation. In Example 1.2 we observe time to a specific responseto a drug candidate. Such a variable can only take nonnegative val-ues, while the normal rv’s domain isR. A lognormal distribution is


often used in such cases.X has a lognormal distribution iflogX isnormally distributed.

Other popular continuous distributions include:

1. Uniform,U(a, b): The support set and the pdf are, respectively,X = [a, b] and

fX(x) =1

b− aI[a,b](x).

E(X) =a+ b

2, var(X) =

(b− a)2

12.

2. Exp(λ): The support set and the pdf are, respectively,X = [0,∞) and

fX(x) = λe−λxI[0,∞)(x),

whereλ > 0.

E(X) =1

λ, var(X) =

1

λ2.

Examples:

• X ≡ the time between arrivals of e-mails on your com-puter;

• X ≡ the distance between major cracks in a motorway;

• X ≡ the life length of car voltage regulators.

3. Gamma(α, λ): An exponential rv describes the length of timeor space between counts. The length untilr counts occur is ageneralization of such a process and the respective rv is calledtheErlang random variable. Its pdf is given by

fX(x) =xr−1λre−λx

(r − 1)!I[0,∞)(x), r = 1, 2, . . .


The Erlang distribution is a special case of theGamma distri-butionin which the parameterr is any positive number, usuallydenoted byα. In the gamma distribution we use the general-ization of the factorial represented by the gamma function,asfollows.

Γ(α) =

∫ ∞

0

xα−1e−xdx, for α > 0.

A recursive relationship that may be easily shown integratingthe above equation by parts is

Γ(α) = (α− 1)Γ(α− 1).

If α is a positive integer, then

Γ(α) = (α− 1)!

sinceΓ(1) = 1.

The pdf is

fX(x) =xα−1λαe−λx

Γ(α)I[0,∞)(x),

where theα > 0, λ > 0 andΓ(·) is the gamma function. Themean and the variance of the gamma rv are

E(X) =α

λ, var(X) =

α

λ2.

4. χ2(ν): A rv X has aChi-square distribution withν degreesof freedomiff X is gamma distributed with parametersα = ν

2

andλ = 12 . This distribution is used extensively in interval

estimation and hypothesis testing. Its values are tabulated.

Example1.9. Denote byX the times untilα responses get on anon-line computer terminal. Assume that rvX has approximatelya gamma distribution with mean four seconds and variance eightseconds2.


(a) Write the probability density function forX.

(b) What is the probability thatα responses are received on thecomputer terminal in no more than five seconds?

Solution:

(a) We have

E(X) =α

λ= 4

var(X) =α

λ2= 8

This givesλ = 1/2 andα = 2 and so the pdf is

fX(x) =xα−1λαe−λx

Γ(α)I[0,∞)(x) =

1

4xe−1/2 xI[0,∞)(x),

asΓ(2) = 1.

(b) This is chi-square distribution withν = 4 degrees of freedomand the value ofF (5) = PX(X ≤ 5) = 0.7127 can be foundin the statistical tables.

1.6 Moments and Moment Generating Functions

Definition 1.6. Thenth moment (n ∈ N) of a random variableX isdefined as

µ′n = EXn

Thenth central moment ofX is defined as

µn = E(X − µ)n,

whereµ = µ′1 = EX.

1.6. MOMENTS AND MOMENT GENERATING FUNCTIONS 21

Note, that the second central moment is the variance of a randomvariableX, usually denoted byσ2.

Moments give an indication of the shape of the distribution of a ran-dom variable. Skewness and kurtosis are measured by the followingfunctions of the third and fourth central moment respectively:

the coefficient of skewnessis given by

γ1 =E(X − µ)3

σ3=

µ3√

µ32

;

the coefficient of kurtosis is given by

γ2 =E(X − µ)4

σ4− 3 =

µ4

µ22

− 3.

Moments can be calculated from the definition or by using so calledmoment generating function.

Definition 1.7. Themoment generating function (mgf) of a randomvariableX is a functionMX : R → [0,∞) given by

MX(t) = E etX ,

provided that the expectation exists fort in some neighborhood ofzero.

More explicitly, the mgf ofX can be written as

MX(t) =

∫ ∞

−∞etxfX(x)dx, if X is continuous,

MX(t) =∑

x∈XetxP (X = x), if X is discrete.


The method to generate moments is given in the following theorem.

Theorem 1.4.If X has mgfMX(t), then

E(Xn) = M(n)X (0),

where

M(n)X (0) =

dn

dtnMX(t)|0.

That is, then-th moment is equal to then-th derivative of the mgfevaluated att = 0.

Proof. We will prove the theorem for a discrete rv case. It is similarfor a continuous rv. Forn = 1 we have

d

dtMX(t) =

d

dt

∑

x∈XetxPX(X = x)

=∑

x∈X

(

d

dtetx)

PX(X = x)

=∑

x∈X

(

xetx)

PX(X = x)

= E(XetX).

Hence, evaluating the last expression at zero we obtain

d

dtMX(t)|0 = E(XetX)|0 = E(X).

Forn = 2 we will get

d2

dt2MX(t)|0 = E(X2etX)|0 = E(X2).

Analogously, it can be shown that for anyn ∈ N we can write

dn

dtnMX(t)|0 = E(XnetX)|0 = E(Xn).


Example1.10. Find the mgf ofX ∼ Exp(λ) and use results of The-orem 1.4 to obtain the mean and variance ofX.

By definition the mgf can be written as

MX(t) = E(etX) =

∫ ∞

−∞etxfX(x)dx.

For the exponential distribution we have

fX(x) = λe−λxI(0,∞)(x),

whereλ ∈ R+. Hence,

MX(t) =

∫ ∞

0

etxλe−λxdx = λ

∫ ∞

0

e(t−λ)xdx =λ

λ− tprovided that|t| < λ.

Now, using Theorem 1.4 we obtain the first and the second moments,respectively:

E(X) = M ′X(0) =

λ

(λ− t)2∣

∣

t=0=

1

λ,

E(X2) = M(2)X (0) =

2λ

(λ− t)3∣

∣

t=0=

2

λ2.

Hence, the variance ofX is

var(X) = E(X2)− [E(X)]2 =2

λ2− 1

λ2=

1

λ2.

Moment generating functions provide methods for comparingdistri-butions or finding their limiting forms. The following two theoremsgive us the tools.

Theorem 1.5.LetFX(x) andFY (y) be two cdfs whose all momentsexist. Then, if the mgfs ofX andY exist and are equal, i.e.,MX(t) =MY (t) for all t in some neighborhood of zero, thenFX(u) = FY (u)

for all u.


Theorem 1.6.Suppose thatX1, X2, . . . is a sequence of randomvariables, each with mgfMXi

(t). Furthermore, suppose that

limi→∞

MXi(t) = MX(t), for all t in a neighborhood of zero,

andMX(t) is an mgf. Then, there is a unique cdfFX whose momentsare determined byMX(t) and, for allx whereFX(x) is continuous,we have

limi→∞

FXi(x) = FX(x).

This theorem means that the convergence of mgfs implies conver-gence of cdfs.

Example1.11. We know that the Binomial distribution can be ap-proximated by a Poisson distribution whenp is small andn is large.Using the above theorem we can confirm this fact.

The mgf ofXn ∼ Bin(n, p) and ofY ∼ Poisson(λ) are, respec-tively:

MXn(t) = [pet + (1− p)]n, MY (t) = eλ(e

t−1).

We will show that the mgfs ofXn tend to the mgf ofY , wherenp → λ.

We will need the following useful result given in the lemma:

Lemma1.1. Let a1, a2, . . . be a sequence of numbers converging toa, that is,limn→∞ an = a. Then

limn→∞

(

1 +ann

)n

= ea.


Now, we can write

MXn(t) =

(

pet + (1− p))n

=

(

1 +1

nnp(et − 1)

)n

=

(

1 +np(et − 1)

n

)n

−→n→∞

eλ(et−1) = MY (t).

Hence, by Theorem 1.6 the Binomial distribution converges to aPoisson distribution.

Documents

Chapter 1 Random Variables and their Distributionsbb/SM_Notes_1.1-1.6.pdf · 2010-10-09 · Chapter 1 Random Variables and their Distributions 1.1 Random Variables Deﬁnition 1.1