5.1 Continuous Random Variables - Stony Brooklinli/teaching/ams-310/lecture-notes... · 2013. 3. 26. · A normally distributed random variable has a 5% chance of exceeding a z-score

5.1 Continuous Random Variables

discrete random variable: X = sum on the three dice Values for X: 3 4 5 6 …. 16 17 18

Probabilities for X: 1

216

3

216 3

216

4

216 …

3

216

3

216

1

216 𝑃 𝑥𝑖 = 𝑖 = 118

𝑖=3

Recall discrete random variables:

…

e.g. Sample space S: 3-dice throws

Now consider continuous random variables:

continuous random variable: distance of accident from beginning of LIE

Values for X:

e.g. accidents on the LIE

* * * *

* * … …

0 10 20 30 40 50 60 70

“Bin” the LIE into n adjacent (non-overlapping) intervals, of lengths ∆𝐿. (∆𝐿 = 71.02/n where 71.02 is the length of the LIE in miles.) Suppose the measured probability distribution for seeing accidents in each of the intervals is given by interval i 1, 2, 3, 4, ……, n 𝑃 𝑥 in interval 𝑖 𝑝1, 𝑝2, 𝑝3, 𝑝4, … , 𝑝𝑛 𝑝𝑖 = 1𝑛

𝑖=1 Write 𝑝𝑖 = 𝑓𝑖∆𝐿, expressing the probability distribution as interval i 1, 2, 3, 4, ……, n 𝑃 𝑥 in interval 𝑖 𝑓1∆𝐿, 𝑓2∆𝐿, 𝑓3∆𝐿, 𝑓4∆𝐿, … , 𝑓𝑛∆𝐿 𝑓𝑖∆𝐿 = 1𝑛

𝑖=1

Now consider what happens when 𝑛 → ∞ s. t. ∆𝐿 =71.02

𝑛 → 0. Then 𝑓𝑖 → 𝑓 𝑥 where x

is a specific point on the LIE. We see that the probability 𝑷 𝒙 = 𝒇(𝒙)∆𝑳 → 𝟎 as ∆𝑳 → 𝟎 . i.e. the probability becomes vanishingly small for any specific value x of the continuous random variable X.

Note that ∆𝐿 corresponds to the bin width. Therefore 𝑓𝑖 ≡ 𝑝𝑖/∆𝐿 corresponds to a the probability density (chapter 2). In the limit ∆𝐿 → 0 , 𝑓(𝑥) is the probability density at the point x. While the probability p(x) = 𝒇(𝒙)∆𝑳 vanishes as ∆𝑳 → 𝟎, the probability density 𝒇(𝒙) does not vanish as ∆𝑳 → 𝟎

For a continuous random variable, since the probability at a point x is vanishingly small, the appropriate question is what is the probability that the random variable will take a value over an interval (a,b)? The probability is given by

𝑃 𝑎 ≤ 𝑥 ≤ 𝑏 = 𝑓 𝑥 𝑑𝑥𝑏

𝑎

(provided the probability density f(x) is an integrable function for all values of the continuous random variable) Thus we see that, for a continuous random variable described by a probability density function f(x), the probability for getting a value for the random variable in the range (a,b) is equal to the area under the probability density curve between a and b.

Since the probability at a single point is vanishingly small, the following probabilities are all equal

𝑃 𝑎 ≤ 𝑥 ≤ 𝑏 = 𝑃 𝑎 < 𝑥 ≤ 𝑏 = 𝑃 𝑎 ≤ 𝑥 < 𝑏 = 𝑃 𝑎 < 𝑥 < 𝑏

a b

f(x)

x

𝑃 𝑎 ≤ 𝑥 ≤ 𝑏

𝑃 𝑏 ≤ 𝑥

𝑃 𝑥 ≤ 𝑎

Note, that the probability density function must satisfy 𝑓 𝑥 ≥ 0 for all 𝑥 ,

𝑓 𝑥 𝑑𝑥∞

−∞

= 1

We define the cumulative distribution function F(x) (aka the distribution function) as the probability of getting a value in the interval −∞, 𝑥

𝑃 𝑋 < 𝑥 ≡ 𝐹 𝑥 = 𝑓 𝑡 𝑑𝑡𝑥

−∞

Then the probability of getting a value in the interval 𝑥,∞ is

𝑃 𝑋 > 𝑥 = 𝑓 𝑡 𝑑𝑡∞

𝑥

= 𝑓 𝑡 𝑑𝑡∞

−∞

− 𝑓 𝑡 𝑑𝑡𝑥

−∞

= 1 − 𝐹(𝑥)

and the probability of getting a value in (a,b) is F(𝑏) − 𝐹(𝑎) Finally we notice, from the fundamental theorem of calculus

𝑓 𝑥 = 𝑑𝐹(𝑥)

𝑑𝑥, (wherever the derivative exists)

e.g.

𝑓 𝑥 = 2𝑒−2𝑥 for 𝑥 > 0

0 for 𝑥 ≤ 0

a) find 𝑃(1 ≤ 𝑥 ≤ 3) b) find 𝑃(𝑥 ≥ 0.5) c) find F(x) d) find F(1)

e.g. Find the appropriate value of k so that

𝑓 𝑥 = 0 for 𝑥 ≤ 0

𝑘𝑥𝑒−4𝑥2 for 𝑥 > 0

is a probability distribution function.

Note, for continuous probability distributions, the cumulative distribution F(x) (equivalently the probability density f(x)) is the fundamental descriptor of the distribution. In analogy to discrete probability distributions, for continuous probability distributions we define the k’th moment about the origin as

𝑢𝑘′ = 𝑥𝑘

∞

−∞

𝑓 𝑥 𝑑𝑥

The mean value of the distribution is the 1’st moment about the origin

𝜇 = 𝑥∞

−∞

𝑓 𝑥 𝑑𝑥

The k’th moment about the mean is

𝜇𝑘 = (𝑥 − 𝜇)𝑘∞

−∞

𝑓 𝑥 𝑑𝑥

and the variance is the 2’nd moment about the mean

𝜎2 = (𝑥 − 𝜇)2∞

−∞

𝑓 𝑥 𝑑𝑥 = 𝑥2∞

−∞

𝑓 𝑥 𝑑𝑥 − 𝜇2

e.g. for

𝑓 𝑥 = 2𝑒−2𝑥 for 𝑥 > 0

0 for 𝑥 ≤ 0

find the mean, variance and standard deviation

5.2 The Normal Distribution The normal probability density (aka the “bell curve”)

𝑓 𝑥; 𝜇, 𝜎2 = 1

2𝜋 𝜎𝑒−(𝑥−𝜇)

2/2𝜎2 , −∞ < 𝑥 < ∞

having mean value μ and standard deviation σ is the most important continuous probability density

𝑓𝑥;𝜇,𝜎

2

The normal distribution is

𝐹 𝑥; 𝜇, 𝜎2 = 1

2𝜋 𝜎 𝑒−(𝑠−𝜇)

2/2𝜎2𝑑𝑠 (N)∞

−∞

Under the change of variables 𝒛 = (𝒙 − 𝝁)/𝝈, (N) becomes

𝐹 𝑧 = 1

2𝜋 𝑒−𝑡

2/2 𝑑t (SN)∞

−∞

Every normal distribution (N) can be mapped to the standard normal distribution (SN) under the change of variables 𝒛 = (𝒙 − 𝝁)/𝝈. z is referred to as the z-score for the random value x.

The integration of 𝑒−𝑥2𝑑𝑥

∞

−∞:

Let 𝐼 = 𝑒−𝑥2

2 𝑑𝑥∞

−∞, then 𝐼2 = ( 𝑒−

𝑥2

2 𝑑𝑥)2 =∞

−∞ 𝑒−𝑥2

2 𝑑𝑥∞

−∞ 𝑒−𝑦2

2 𝑑𝑦∞

−∞

1

4𝐼2 = 𝑒−(𝑥

2+𝑦2)/2𝑑𝑥𝑑𝑦∞

0

∞

0

Change to polar coordinate

1

4𝐼2 = 𝑒−

𝑟2

2 𝑟𝑑𝜃𝑑𝑟

𝜋2

0

=∞

0

𝜋

2 𝑟𝑒−

𝑟2

2

∞

0

𝑑𝑟 =𝜋

2

𝐼 = 2𝜋 This is the prominent complete Gauss integral.

Because of the mapping (N) to (SN), it is possible to compute probabilities for any normal distribution (N) by tabulating only values for the standard distribution (SN) and using

𝐹 𝑥: 𝜇, 𝜎2 = 𝐹(𝑧 =𝑥 − 𝜇

𝜎)

Tabulated values of F(z) are given in Table 3 in Appendix B. Note that the standard normal distribution can be interpreted as the normal distribution with 𝜇 = 0 and 𝜎 = 1. i.e. 𝐹 𝑧 = 𝐹(𝑥; 0,1) (The literature often refers to the normal distribution by the notation N(𝜇, 𝜎). In this notation, the standard normal distribution is denoted N(0,1).)

e.g. Using Table 3, find the probability that the z-score will take on a value a) 0.87 < Z < 1.28 b) -0.34 < Z < 0.62 c) 0.85 < Z d) -0.65 < Z In all cases indicate the integration region under the probability density curve that gives the required probability.

𝒛𝜶 Notation Define 𝒛𝜶 as the z-score such that the probability of exceeding z is α. That is 𝑧𝛼 satisfies

𝛼 = 𝑃 𝑍 > 𝑧𝛼 = 𝑓 𝑡 𝑑𝑡∞

𝑧𝛼

= 1 − 𝐹(𝑧𝛼)

or equivalently 𝐹 𝑧𝛼 = 1 − 𝛼

Two important values for 𝑧𝛼 are 𝑧0.05 and 𝑧0.01:

A normally distributed random variable has a 5% chance of exceeding a z-score of 𝑧0.05

A normally distributed random variable has a 1% chance of exceeding a z-score of 𝑧0.01

e.g. Using Table 3, find 𝑧0.05 and 𝑧0.01

Normal probabilities If X is a continuous random variable having normal distribution with mean μ and standard deviation σ, then

𝑃 𝑎 < 𝑋 ≤ 𝑏 = 𝐹 𝑧(𝑏) − 𝐹 𝑧 𝑎 = 𝐹𝑏 − 𝜇

𝜎− 𝐹

𝑎 − 𝜇

𝜎

e.g. The maximum attenuation (MA) of a bar code scanner varies as a normally distributed random variable with mean 10.1 dB and standard deviation 2.7 dB a) on the next scan, what is the probability that the MA lies between 8.5 dB and 13.0

dB b) what proportion of scans yield MA between 8.5 and 13.0 dB c) what proportion of scans yield MA greater than 15.1 dB

e.g. A machine automatically filling 4-oz cans of coffee delivers an actual amount that is a normally distribute random variable with σ = 0.04 oz. If only 2% of jars are to contain less than 4 oz of coffee, what should be the mean-fill that the machine must achieve.

Continuity correction when approximating discrete random variables by a continuous normal distribution. e.g. power outages per month are integers, 0, 1, 2, 3, ….. Assume the mean number of power outages per month is 11.6 with standard deviation of 3.3. Using a normal distribution, approximate the probability that there will be at least 8 (8 or more) outages in a given month.

The first inclination is to compute

𝑃 𝑥 ≥ 8 ≈ 1 − 𝐹 𝑧 =8 − 11.6

3.3= 1 − 0.1379 = 0.8621

A more thoughtful result, which recognizes the discrete nature of the variables is to “assign” the discrete value 𝑥𝑖 to the continuous range 𝑥𝑖 − 0.5, 𝑥𝑖 + 0.5 . Therefore

𝑃(𝑥 ≥ 8) ≈ 1 − 𝐹 𝑧 =7.5 − 11.6

3.3= 1 − 0.1075 = 0.8925

The continuity correction is about 3% (0.03)

5.3 Normal distribution approximation to the Binomial distribution

When 𝑛 → ∞ and 𝑝 → 0 we know that the Poisson distribution can be used to approximate the binomial distribution 𝑏 𝑥; 𝑛, 𝑝 ≈ 𝑓(𝑥; λ = 𝑛𝑝)

When 𝑛 → ∞ and 𝑝 ↛ 0 (strictly p = 0.5), the normal distribution can be used to approximate the binomial distribution

𝑏 𝑥; 𝑛, 𝑝 ≈ 𝑓 𝑥; 𝜇 = 𝑛𝑝, 𝜎2 = 𝑛𝑝 1 − 𝑝 = 𝑓 𝑧 with 𝑧 =𝑥−𝜇

𝜎

equivalently

𝐵 𝑥; 𝑛, 𝑝 ≈ 𝐹 𝑥; 𝑛𝑝, 𝑛𝑝 1 − 𝑝 = 𝐹(𝑧)

As the x values are discrete, the continuity correction should be used to compute z

e.g. 20% of the memory chips made in a plant are defective. What is the probability that, in a lot of 100 chips a) at most 15 are defective b) exactly 15 are defective

a) We want P x ≤ 15 = 𝐵 15; 𝑛 = 100, 𝑝 = 0.2 ≈ 𝐹 15.5; 𝜇 = 20, 𝜎2 = 16

= 𝐹15.5 − 20

4= 𝐹 −1.13 = 0.1292

b) we want P x = 15 = b(15, , 100,0.2) ≈ 𝐹 15.5; 𝜇 = 20, 𝜎2 = 16 − 𝐹 14.5; 𝜇 = 20, 𝜎2 = 16

= 𝐹 −1.13 − 𝐹 −1.38 = 0.0454

Comparison between binomial and normal distributions

http://www.google.com/url?sa=i&rct=j&q=image+of+binomial+to+normal+approximation&source=images&cd=&cad=rja&docid=02uoB0lAjNm_9M&tbnid=r_xVMOkbvWCvtM:&ved=0CAUQjRw&url=http://commons.wikimedia.org/wiki/File:Normal_approximation_to_binomial.png&ei=v_BRUdbvN9DO0QGvxYDwAQ&bvm=bv.44342787,d.dmQ&psig=AFQjCNEwRD9ejZad0yBjy4QjnMVrgR4LHw&ust=1364410900940652

In practice, approximate binomial with normal if np > 15 and n(1─p) > 15

5.5 The Uniform Distribution

The uniform distribution on the interval (𝛼, 𝛽) has the density function

𝑓 𝑥 =

1

𝛽 − 𝛼 for 𝛼 < 𝑥 < 𝛽

0 elsewhere

1

𝛽 − 𝛼

𝛼 𝛽

𝑓(𝑥)

𝑥

𝑃 𝑐 < 𝑥 < 𝑑 = 𝑑 − 𝑐

𝛽 − 𝛼

𝑐 𝑑

The mean μ of the uniform distribution is

𝜇 = 𝑥 ∙1

𝛽 − 𝛼𝑑𝑥 =

𝛽

𝛼

𝛼 + 𝛽

2

The variance is

𝜎2 = 𝑥2 ∙1

𝛽 − 𝛼𝑑𝑥 − 𝜇2 =

1

12(𝛽 − 𝛼)2

𝛽

𝛼

5.6 The Log-Normal distribution

For RV X with values 0 < 𝑥 < ∞, let 𝑦 = ln 𝑥 , (−∞ < 𝑦 < ∞). Assume 𝑦 is a normally distributed random variable with mean value 𝛼 and standard deviation 𝛽, i.e.

𝑓 𝑦 = 1

2𝜋 𝛽𝑒−(y−𝛼)

2/2𝛽2 , −∞ < y < ∞

𝐹 𝑦 =1

2𝜋 𝛽 𝑒−(s−𝛼)

2/2𝛽2𝑑𝑠𝑦

−∞

(∗)

Replacing 𝑥 = 𝑒𝑦 in (*) we have

𝐹 𝑥 =1

2𝜋 𝛽 𝑒−(ln(t)−𝛼)

2/2𝛽21

𝑡𝑑𝑡

𝑥

0

we see that x has the probability density

𝑓 𝑥 = 1

2𝜋 𝛽

𝑒−(ln(x)−𝛼)2/2𝛽2

𝑥, 0 < x < ∞

The random variable X is said to be log-normally distributed.

NOTE: 𝜶 𝐚𝐧𝐝 𝜷 are the mean and standard deviation of 𝒚 = 𝒍𝒏 (𝒙).

e.g. x is a log-normally distributed variable. Find the probability that x lies in the range (a,b). We have to evaluate

𝑃 𝑎 < x < b = 𝑓 𝑥 𝑑𝑥 =𝑏

𝑎

1

2𝜋 𝛽

𝑒−(ln(x)−𝛼)2/2𝛽2

𝑥 𝑑𝑥

𝑏

𝑎

Since 𝑦 = ln 𝑥 is normally distributed, it is simplest to evaluate the probability by substituting 𝑦 = ln 𝑥 into the integral giving

𝑃 𝑎 < x < b = 1

2𝜋 𝛽 𝑒−(y−𝛼)

2/2𝛽2 𝑑𝑦 = 𝐹ln 𝑏 − 𝛼

𝑏− 𝐹

ln 𝑎 − 𝛼

𝛽

ln (𝑏)

ln (𝑎)

where F() is the standard normal distribution function

e.g. The gold content (oz/st , st = short ton (2000 lbs)) of ore in a mine is log-normally distributed, with mean ─4.6 and variance 1.21. What is the probability of getting 0.0015 oz/st or less from an assayed ore?

𝐹ln 0.0015 − (−4.6)

1.21= 𝐹 −1.729 ≈ 𝐹(−1.73) = 0.0419

Log-normally distributed variables appear commonly in geosciences: e.g. the value for the flow-permeability of a rock; ore content in rock

Let x be a random variable with a log-normal distribution. Then 𝑦 = ln (𝑥) is a normally distributed random variable, with and density

𝑓 𝑦 = 1


2/2𝛽2 , −∞ < y < ∞

where 𝛼 and 𝛽 are the mean value and standard deviation for y.

As we have seen, x has the density

𝑓 𝑥 = 1

2𝜋 𝛽

𝑒−(ln(x)−𝛼)2/2𝛽2

𝑥

What is the mean value of the distribution for x?

𝜇 =1

2𝜋 𝛽 𝑥

𝑒−(ln(x)−𝛼)2/2𝛽2

𝑥 𝑑𝑥

∞

0

To evaluate μ, use the substitution 𝑥 = 𝑒𝑦

𝜇 =1

2𝜋 𝛽 𝑒𝑦𝑒−(𝑦−𝛼)

2/2𝛽2 𝑑𝑦∞

−∞

= 𝑒𝛼+𝛽2/2

In particular note that 𝛼 ≠ ln (𝜇)

The variance for x is

𝜎2 = 𝑒2𝛼+𝛽2𝑒𝛽

2− 1

Again note that β ≠ ln (𝜎)

Log-normal distributions are positively skewed e.g. f(x) for a log-normal variable with 𝛼 = 0, 𝛽 = 1

𝑦 = ln (𝑥)

𝛼 = 0

𝛽 = 1

𝜇 = 1.648

𝜎 = 2.16

5.7 The Gamma distribution

The Gamma distribution has the density function

𝑓 𝑥 =

1

𝛽𝛼Γ(𝛼)𝑥𝛼−1𝑒−𝑥/𝛽 for 𝑥 > 0, 𝛼 > 0, 𝛽 > 0

0 for 𝑥 ≤ 0

where Γ 𝛼 is the gamma function defined by

Γ 𝛼 = 𝑥𝛼−1𝑒−𝑥𝑑𝑥∞

0

Integration-by-parts of the definition of Γ 𝛼 shows that the gamma function obeys a recursion relation

Γ 𝛼 = 𝛼 − 1 Γ 𝛼 − 1 , 𝛼 > 1 Note, Γ 1 = 1.

Thus if 𝛼 is an integer, we see that Γ 𝛼 = 𝛼 − 1 ! The gamma function is therefore the generalization of the factorial function for positive real numbers.

The gamma density function is positively skewed

α α α α α

β= β= β= β= β=

The mean value for the gamma distribution is given by

𝜇 =1

𝛽𝛼Γ(𝛼) 𝑥 ∙ 𝑥𝛼−1𝑒−𝑥/𝛽𝑑𝑥∞

0

let 𝑦 = 𝑥/𝛽

=𝛽

Γ(𝛼) 𝑦𝛼𝑒−𝑦𝑑𝑥∞

0

=𝛽Γ(𝛼 + 1)

Γ(𝛼)=𝛽𝛼Γ(𝛼)

Γ(𝛼)

𝜇 = 𝛼𝛽 Similarly the variance is 𝜎2 = 𝛼𝛽2

The Exponential distribution

In the special case 𝛼 = 1, the gamma distribution becomes the exponential distribution

𝑓 𝑥 =

1

𝛽𝑒−𝑥/𝛽 for 𝑥 > 0, 𝛽 > 0

0 for 𝑥 ≤ 0

having mean value and variance 𝜇 = 𝛽, 𝜎2= 𝛽2

Recall that a Poisson process is a sequence of Bernoulli trials operating in continuous time.

If a is the average number of successes per unit interval, than the probability distribution for the random variable: x successes in time t , is given by the Poisson distribution with λ = 𝑎𝑡.

Probability of 0 successes in time t is 𝑃 0; 𝑎𝑡 = λ0𝑒−λ

0!= 𝑒−𝑎𝑡.

In deriving 𝑒−𝑎𝑡 we assumed the waiting time t, (time to first (or next) success) was arbitrary (but fixed). To allow for any waiting time 0 < 𝑡 < ∞, we need to ensure that we have a properly defined density. i.e. we require

𝑐 𝑒−𝑎𝑡∞

𝑜

𝑑𝑡 = 1

Integrating gives 𝑐 = 𝑎. Thus the probability distribution for the random variable:

waiting time between successes, is given by an exponential distribution with 𝛽 =1

𝑎.

e.g. the number of trucks arriving at a warehouse can be described by a Poisson distribution. An average of 3 trucks arrive per hour. What is the probability that the time between truck arrivals will be a) < 5 minutes? b) at least 45 minutes?

a) a = 3; 𝑃 𝑡 < 5 min = 𝑃 𝑡 <1

12 hr = 3 𝑒−3𝑡𝑑𝑡 = 1 − 𝑒−0.25

1/12

0= 0.221

b) 𝑃 𝑡 > 45 min = 𝑃 𝑡 >3

4 hr = 3 𝑒−3𝑡𝑑𝑡 = 𝑒−9/4

∞

3/4= 0.105

5.8 The Beta distribution

Note that the normal distributions is applicable to random variables whose values range from −∞ < 𝑥 < ∞. The gamma, exponential and log-normal distributions are applicable to random variables whose values range from 0 < 𝑥 < ∞. So far we have only one distribution, the uniform distribution, applicable to random variables whose values range over a finite interval a < 𝑥 < 𝑏 (and its probability density is somewhat ‘rigid’). The beta distribution is a more ‘flexible’ probability distribution applicable to random variables whose values range over a finite interval.

Let a < 𝑦 < 𝑏. Define 𝑥 =𝑦−𝑎

𝑏−𝑎 . Then 0 < 𝑥 < 1.

The beta distribution has the probability density, mean, and variance

𝑓 𝑥 =

Γ(𝛼 + 𝛽)

Γ(𝛼)Γ(𝛽)𝑥𝛼−1(1 − 𝑥)𝛽−1 for 0 < 𝑥 < 1, 𝛼 > 0, 𝛽 > 0

0 elsewhere

𝜇 =𝛼

𝛼 + 𝛽 and 𝜎2 =

𝛼𝛽

(𝛼 + 𝛽)2(𝛼 + 𝛽 + 1)

Note: for 𝛼 = 𝛽 = 1, the beta distribution becomes the uniform distribution on 0 < 𝑥 < 1

e.g. Random variable X is the proportion of highway sections in the country needing repair in a 1 year period. X has values 0 < 𝑥 < 1. Assume values for X have a beta distribution with 𝛼 = 3, 𝛽 = 2. a) What is the average percentage of highway sections needing repair each year? b) What is the probability that, at most half, of the sections will require repair?

a) 𝜇 =3

5= 0.6 = 60%

b) 𝑃 𝑥 < 0.5 =Γ 5

Γ 3 Γ 2 𝑥2 1 − 𝑥 𝑑𝑥0.5

0= 12

𝑥3

3−

𝑥4

4 0

0.5

=5

16

The Weibull distribution

density

𝑓 𝑥 = 𝛼𝛽𝑥𝛽−1𝑒−𝛼𝑥

𝛽 for 𝑥 > 0, 𝛼 > 0, 𝛽 > 00 for 𝑥 ≤ 0

mean

𝜇 = 𝛼−1/𝛽Γ 1 +1

𝛽

variance

𝜎2 = 𝛼−2/𝛽 Γ 1 +2

𝛽− Γ 1 +

1

𝛽

2

Consider 𝐹 𝑎 = 𝑃(𝑥 < 𝑎), the probability that a random variable with a Weibull distribution will have the value 𝑥 < 𝑎:

𝑃 𝑥 < 𝑎 = 𝛼𝛽𝑥𝛽−1𝑒−𝛼𝑥𝛽𝑑𝑥

𝑎

0

Let 𝑦 = 𝑥𝛽, d𝑦 = 𝛽𝑥𝛽−1𝑑𝑥

𝑃 𝑦 < 𝑎𝛽 = 𝛼𝑒−𝛼𝑦𝑑𝑦𝛼𝛽

0

Then y is a random variable having an exponential distribution

𝛼 𝛼 𝛼 𝛼

𝛽 𝛽 𝛽 𝛽

e.g. random variable X is: lifetime (in hours) of a battery. Values of X are randomly distributed according to a Weibull distribution with 𝛼 = 0.1, 𝛽 = 0.5 a) the average lifetime of a battery? b) the probability that a battery lasts more than 300 hours

a) 𝜇 = 0.1−2Γ 1 + 2 = 100 ∙ 2! = 200 hours

b) 𝑃 𝑥 > 300 = 𝛼𝛽𝑥𝛽−1𝑒−𝛼𝑥𝛽𝑑𝑥

∞

300

𝑦=𝑥𝛽

0.1 𝑒−0.1𝑦𝑑𝑦∞

300

= 𝑒−0.1 300 = 0.177

5.10 Joint Probability Distributions

Consider: the discrete random variable 𝑋1 having values 𝑥1 = 0, 1, 2 with probability distribution

𝑓1(𝑥1) ≡ 𝑃(𝑋1 = 𝑥1) and the discrete random variable 𝑋2 having values 𝑥2 = 0, 1 with probability distribution

𝑓2(𝑥2) ≡ 𝑃(𝑋2 = 𝑥2) Let S be a sample space of outcomes that “generate” values for both 𝑋1 and 𝑋2. We can ask, what is the probability 𝑃 𝑋1 = 𝑥1, 𝑋2 = 𝑥2 of the event consisting of all the outcomes in the sample space S having 𝑋1 take on the value 𝑥1 and 𝑋2 take on the value 𝑥2? We refer to the function

𝑓 𝑥1, 𝑥2 ≡ 𝑃 𝑋1 = 𝑥1, 𝑋2 = 𝑥2

as the joint probability distribution of 𝑋1 and 𝑋2.

Note we must have

𝑓 𝑥1, 𝑥2 ≥ 0 and 𝑓 𝑥1, 𝑥2𝑥2𝑥1

= 1

For discrete random variables, the joint probability distribution can be listed in tabular form

𝑓(𝑥1, 𝑥2)

𝑥1

0 1 2

𝑥2 0 0.1 0.4 0.1

1 0.2 0.2 0.0

𝑓1(𝑥1) 0.3 0.6 0.1 sum = 1

𝑓2(𝑥2)

0.6

0.4

e.g.

a) What is 𝑃 𝑋1 = 0, 𝑋2 = 1 ? b) What is 𝑃 𝑋1 + 𝑋2 > 1 ? c) What is 𝑃 𝑋2 = 1 ? d) What is 𝑃 𝑋1 = 1 ?

The sum down each column

𝑃 𝑥1, 𝑥2

1

𝑥2=0

= 𝑃 𝑋1 = 𝑥1 = 𝑓1 𝑥1

is just the probability distribution for the RV 𝑋1. 𝑓1 𝑥1 is referred to as the marginal probability distribution 𝑓 𝑥1, 𝑥2 for the RV 𝑋1

The sum across each row

𝑃 𝑥1, 𝑥2

2

𝑥1=0

= 𝑃 𝑋2 = 𝑥2 = 𝑓2 𝑥2

is just the probability distribution for the RV 𝑋2. 𝑓2 𝑥2 is referred to as the marginal probability distribution of 𝑓(𝑥1, 𝑥2) for the RV 𝑋2

Recall the definition of conditional probability

𝑃 𝐴 𝐵 = 𝑃(𝐴 ∩ 𝐵)

𝑃(𝐵) (CP)

Consider the probabilities of getting 𝑋1 = 𝑥1 given that 𝑋2 = 0. From the first row of the table

𝑓(𝑥1, 𝑥2)

𝑥1 𝑓2(𝑥2)

0 1 2

𝑥2 0 0.1 0.4 0.1 0.6

we see that the conditional probability 𝑃 𝑋1 = 𝑥1 𝑋2 = 0 is given by

𝑃 𝑋1 = 𝑥1 𝑋2 = 0 =𝑓(𝑥1, 0)

𝑓2(0)

e.g. 𝑃 𝑋1 = 2 𝑋2 = 0 = 0.1 0.6 In general the conditional probability distribution for 𝑿𝟏 given 𝑿𝟐 is

𝑃 𝑋1 = 𝑥1 𝑋2 = 𝑥2 =𝑓(𝑥1, 𝑥2)

𝑓2(𝑥2) ≡ 𝑓1(𝑥1|𝑥2)

Similarly the conditional probability distribution for 𝑿𝟐 given 𝑿𝟏 is

𝑃 𝑋2 = 𝑥2 𝑋1 = 𝑥1 =𝑓(𝑥1, 𝑥2)

𝑓1(𝑥1) ≡ 𝑓2(𝑥2|𝑥1)

𝑥1 0 1 2

𝑓1(𝑥1) 0.3 0.6 0.1

𝑓1(𝑥1|0) 1 6 4 6 1 6

𝑓1(𝑥1|1) 1 2 1 2 0

Compare:

Recall: event A is independent of event B if 𝑷 𝑨 𝑩 = 𝑷(𝑨)

equivalently (Theorem 3.9), two events A and B are independent events iff 𝑃 𝐴 ∩ 𝐵 = 𝑃(𝐴) ∙ 𝑃(𝐵)

Therefore, the two random variables 𝑋1 and 𝑋2 are independent if

𝑓1 𝑥1 𝑥2 = 𝑓1 𝑥1 for all 𝑥1 and 𝑥2 or equivalently,

𝑓 𝑥1, 𝑥2 = 𝑓1 𝑥1 ∙ 𝑓2 𝑥2 for all 𝑥1 and 𝑥2

We can see from the table above that 𝑋1 and 𝑋2 are NOT independent

If we have k discrete random variables, 𝑋1, 𝑋2, ⋯ , 𝑋𝑘 having values denoted 𝑥1, 𝑥2, ⋯ , 𝑥𝑘 respectively, then the joint probability distribution of these random variables is

𝑓 𝑥1, 𝑥2, ⋯ , 𝑥𝑘 ≡ 𝑃 𝑋1 = 𝑥1, 𝑋2 = 𝑥2, ⋯ , 𝑋𝑘 = 𝑥𝑘 This joint distribution has k marginal probability distributions of the form

𝑓𝑖 𝑥𝑖 = 𝑃 𝑋𝑖 = 𝑥𝑖 = ⋯ ⋯ 𝑓 𝑥1, 𝑥2, ⋯ , 𝑥𝑘𝑥𝑘𝑥𝑖+1𝑥𝑖−1𝑥1

Joint probability distributions for continuous random variables

Let 𝑋1, 𝑋2, ⋯ , 𝑋𝑘 be k continuous random variables. Then 𝑓 𝑥1, 𝑥2, ⋯ , 𝑥𝑘 is said to be the joint probability density of these random variables provided a) the probability for 𝑋1, 𝑋2, ⋯ , 𝑋𝑘 to have values, respectively, in the ranges

𝑎1 ≤ 𝑥1 ≤ 𝑏1 , 𝑎2 ≤ 𝑥2 ≤ 𝑏2 , … , 𝑎𝑘 ≤ 𝑥𝑘 ≤ 𝑏𝑘 is given by

𝑃 𝑎1 ≤ 𝑥1 ≤ 𝑏1, 𝑎2 ≤ 𝑥2 ≤ 𝑏2, … , 𝑎𝑘 ≤ 𝑥𝑘 ≤ 𝑏𝑘

= ⋯ 𝑓 𝑥1, 𝑥2, ⋯ , 𝑥𝑘 𝑑𝑥1 𝑑𝑥2⋯𝑑𝑥𝑘

𝑏𝑘

𝑎𝑘

𝑏2

𝑎2

𝑏1

𝑎1

b) 𝑓 𝑥1, 𝑥2, ⋯ , 𝑥𝑘 ≥ 0

c)

⋯ 𝑓 𝑥1, 𝑥2, ⋯ , 𝑥𝑘

∞

−∞

𝑑𝑥1 𝑑𝑥2⋯𝑑𝑥𝑘

∞

−∞

=1∞

−∞

The joint cumulative probability distribution function F 𝑥1, 𝑥2, ⋯ , 𝑥𝑘 is defined as

F 𝑥1, 𝑥2, ⋯ , 𝑥𝑘 = 𝑃 𝑋1 ≤ 𝑥1, 𝑋2 ≤ 𝑥2, … , 𝑋𝑘 ≤ 𝑥𝑘

= ⋯ 𝑓 𝑡1, 𝑡2, ⋯ , 𝑡𝑘 𝑑𝑡1 𝑑𝑡2⋯𝑑𝑡𝑘

𝑥𝑘

−∞

𝑥2

−∞

𝑥1

−∞

e.g. JPD given by

𝑓 𝑥1, 𝑥2 = 6𝑒−2𝑥1−3𝑥2 for 𝑥1 > 0, 𝑥2 > 0

0 elsewhere

find a) P(1 < 𝑥1 < 2, 2 < 𝑥2 < 3)

a) 6𝑒−2𝑥1−3𝑥2 𝑑𝑥2

3

2

𝑑𝑥1 = 2 𝑒−2𝑥1

2

1

𝑑𝑥1 3 𝑒−3𝑥2

3

2

𝑑𝑥2

2

1

= −𝑒−2𝑥1 12 −𝑒−3𝑥2 2

3 = 𝑒−2 − 𝑒−4 𝑒−6 − 𝑒−9

The marginal probability density of the i’th RV 𝑓𝑖 𝑥𝑖 is given by

𝑓𝑖 𝑥𝑖 = ⋯ ⋯∞

−∞

𝑓 𝑥1, ⋯ , 𝑥𝑖−1, 𝑥𝑖+1, ⋯ , 𝑥𝑘 𝑑𝑥1 ⋯𝑑𝑥𝑖−1𝑥𝑖+1⋯𝑑𝑥𝑘

∞

−∞

∞

−∞

∞

−∞

b) P(𝑥1 < 2, 𝑥2 > 2)

b) 6𝑒−2𝑥1−3𝑥2 𝑑𝑥2

∞

2

𝑑𝑥1 = 2 𝑒−2𝑥1

2

0

𝑑𝑥1 3 𝑒−3𝑥2

∞

2

𝑑𝑥2

2

0

= −𝑒−2𝑥1 02 −𝑒−3𝑥2 2

∞ = 1 − 𝑒−4 𝑒−6 − 0

c) F(𝑥1, 𝑥2) d) F(1,1)

c)F 𝑥1, 𝑥2 = 6𝑒−2𝑡−3𝑠 𝑑𝑡𝑥2

0

𝑑𝑠 = 2 𝑒−2𝑡𝑥1

0

𝑑𝑡 3 𝑒−3𝑠𝑥2

0

𝑑𝑠𝑥1

0

= −𝑒−2𝑡 0𝑥1 −𝑒−3𝑠 0

𝑥2 = 1 − 𝑒−2𝑥1 1 − 𝑒−3𝑥2 for 𝑥1 > 0, 𝑥2 > 0 = 0 elsewhere

d) F(1,1) = (1 − 𝑒−2)(1 − 𝑒−3)

e) 𝑓2(𝑥2)

𝑓2 𝑥2 = 6𝑒−2𝑥1−3𝑥2 𝑑𝑥1 = 3 2 𝑒−2𝑥1

∞

0

𝑑𝑥1 𝑒−3𝑥2∞

0

= −𝑒−2𝑥1 0∞ 3𝑒−3𝑥2 = 1 3𝑒−3𝑥2 = 3𝑒−3𝑥2 for 𝑥2 > 0

= 0 elsewhere

The k continuous random variables 𝑋1, 𝑋2, ⋯ , 𝑋𝑘 are said to be independent iff

𝑓 𝑥1, 𝑥2, ⋯ , 𝑥𝑘 = 𝑓1 𝑥1 ∙ 𝑓2 𝑥2 ∙ ⋯ ∙ 𝑓𝑘 𝑥𝑘 for all 𝑥1, 𝑥2 , ⋯ , 𝑥𝑘

This is equivalent to the statement: The k continuous random variables 𝑋1, 𝑋2, ⋯ , 𝑋𝑘 are said to be independent iff

𝐹 𝑥1, 𝑥2, ⋯ , 𝑥𝑘 = 𝐹1 𝑥1 ∙ 𝐹2 𝑥2 ∙ ⋯ ∙ 𝐹𝑘 𝑥𝑘 for all 𝑥1, 𝑥2 , ⋯ , 𝑥𝑘

where 𝐹𝑖 𝑥𝑖 is the cumulative probability distribution for the density 𝑓𝑖 𝑥𝑖

e.g. In our example, part e) we showed

𝑓2 𝑥2 = 3𝑒−3𝑥2 for 𝑥2 > 0

0 elsewhere

This has the cumulative distribution

𝐹2 𝑥2 = 1 − 𝑒−3𝑥2 for 𝑥2 > 0

0 elsewhere

Similarly,

𝐹1 𝑥1 = 1 − 𝑒−2𝑥1 for 𝑥1 > 0

0 elsewhere

In part c) we showed

𝐹 𝑥1, 𝑥2 = 1 − 𝑒−2𝑥1 1 − 𝑒−3𝑥2 for 𝑥1 > 0, 𝑥2 > 0

0 elsewhere

Thus we see, in this example,

𝐹 𝑥1, 𝑥2, = 𝐹1 𝑥1 ∙ 𝐹2 𝑥2 for all 𝑥1, 𝑥2

and the random variables 𝑋1, 𝑋2 are independent

Given two continuous random variables, 𝑋1, 𝑋2, we define the conditional probability density of 𝑋1 given that 𝑋2 = 𝑥2 as

𝑓1 𝑥1 𝑥2 =𝑓 𝑥1, 𝑥2

𝑓2 𝑥2 provided𝑓2 𝑥2 ≠ 0

Similarly

𝑓2 𝑥2 𝑥1 =𝑓 𝑥1, 𝑥2

𝑓1 𝑥1 provided𝑓1 𝑥1 ≠ 0

Consequently we see 𝑓 𝑥1, 𝑥2 = 𝑓1 𝑥1 𝑥2 𝑓2 𝑥2 = 𝑓2 𝑥2 𝑥1 𝑓1 𝑥1

which gives Bayes theorem for continuous random variables

𝑓1 𝑥1 𝑥2 =𝑓2 𝑥2 𝑥1 𝑓1 𝑥1

𝑓2 𝑥2

e.g. Given the JPD

𝑓 𝑥1, 𝑥2 =

2

3𝑥1 + 2𝑥2 for 0 < 𝑥1 < 1, 0 < 𝑥2 < 1

0 elsewhere

find 𝑓1 𝑥1 𝑥2

𝑓1 𝑥1 𝑥2 =𝑓 𝑥1, 𝑥2

𝑓2 𝑥2

𝑓2 𝑥2 =

2

3𝑥1 + 2𝑥2

1

0

𝑑𝑥1 =1

3(1 + 4𝑥2) for 0 < 𝑥2 < 1

0 elsewhere

Therefore

𝑓1 𝑥1 𝑥2 =

23 𝑥1 + 2𝑥2

13 1 + 4𝑥2

for 0 < 𝑥1 < 1, 0 < 𝑥2 < 1

0 elsewhere

Expectation Let X be a random variable (continuous or discrete). Let 𝑔(𝑋) be a function of the random variable X

e.g. if X is the daily temperature in degrees Fahrenheit

then 𝑔 𝑋 = 5

9(𝑋 − 32) is the daily temperature in degrees centigrade

The expected value (aka expectation) 𝐸 𝑔 𝑥 is defined as: if X is a discrete random variable with probability distribution 𝑓(𝑥𝑖)

𝐸 𝑔 𝑋 ≡ 𝑔 𝑥𝑖 𝐹(𝑥𝑖)

𝑥𝑖

If X is a continuous random variable with probability density 𝑓(𝑥)

𝐸 𝑔 𝑋 ≡ 𝑔 𝑥 𝑓 𝑥 𝑑𝑥

∞

−∞

e.g. We see that the mean value is the expectation of the function 𝑔 𝑋 = 𝑋

For that reason, the mean value 𝜇 is often written as 𝐸[𝑋] or 𝐸(𝑋)

e.g. the variance 𝜎2 is E[ 𝑋 − 𝜇 2] (or E 𝑋 − 𝜇 2 or E[ 𝑋 − 𝐸 𝑋2])

You may also encounter “< >” notation for the expectation.

e.g. 𝐸 𝑔 𝑋 ≡< 𝑔 𝑋 >, 𝐸 𝑋 ≡< 𝑋 >, 𝐸 (𝑋 − 𝜇)2 ≡< (𝑋 − 𝜇)2>

As we are now encountering multiple random variables, we need to enhance our notation to make sure we know what random variable we are referencing. Let X and Y denote two random variables. We can refer to the mean and variance of X by:

𝐸 𝑋 = 𝜇𝑋, E 𝑋 − 𝜇 2 = 𝜎𝑋2 = 𝑉𝑎𝑟(𝑋)

and the mean and variance of Y by: 𝐸 𝑌 = 𝜇𝑌 , E 𝑌 − 𝜇 2 = 𝜎𝑌

2 = 𝑉𝑎𝑟(𝑌) e.g. Let X be a log-normally distributed R.V. Then 𝑌 = ln (𝑋) is a normally distributed R.V. X has the density

𝑓 𝑥 = 1

2𝜋 𝛽

𝑒−(ln(x)−𝛼)2/2𝛽2

𝑥, 0 < x < ∞

and Y has the density

𝑓 𝑦 = 1


2/2𝛽2 , −∞ < y < ∞

Means and variances are given by

𝐸 𝑋 = 𝜇𝑋 = 𝑒𝛼+𝛽2/2, 𝑉𝑎𝑟 𝑋 = 𝜎𝑋

2 = 𝑒2𝛼+𝛽2𝑒𝛽

2− 1

𝐸 𝑌 = 𝜇𝑌 = 𝛼, 𝑉𝑎𝑟 𝑌 = 𝜎𝑌2 = 𝛽2

Properties of the Expectation Let X be a continuous RV with density 𝑓(𝑥), mean 𝐸[𝑋], and variance 𝑉𝑎𝑟(𝑋), than Y = 𝑔(𝑋) is a new RV, whose density, mean, variance etc. need to be determined. The expectation has certain properties that may make it easier to determine things for Y. In particular, let Y = 𝑔 𝑋 = 𝑎𝑋 + 𝑏 for constants a and b. Then

𝐸 𝑌 = 𝐸 𝑎𝑋 + 𝑏 = 𝑎𝑥 + 𝑏 𝑓 𝑥∞

−∞

𝑑𝑥 = 𝑎 𝑥𝑓 𝑥∞

−∞

𝑑𝑥 + b 𝑓 𝑥∞

−∞

𝑑𝑥

= 𝑎𝐸 𝑋 + 𝑏 and

𝑉𝑎𝑟 𝑌 = 𝑉𝑎𝑟 𝑎𝑋 + 𝑏 = (𝑎𝑥 + 𝑏 − 𝑎𝜇𝑋 − 𝑏)2𝑓 𝑥∞

−∞

𝑑𝑥

= 𝑎2 (𝑥 − 𝜇𝑋)2𝑓 𝑥

∞

−∞

𝑑𝑥

= 𝑎2𝑉𝑎𝑟(𝑋) Thus we have, for constants a and b:

𝐸 𝑎𝑋 + 𝑏 = 𝑎𝐸 𝑋 + 𝑏, 𝑉𝑎𝑟 𝑎𝑋 + 𝑏 = 𝑎2𝑉𝑎𝑟(𝑋)

Although we have developed these two properties for continuous RVs, they also can be shown to hold for discrete RVs

e.g. Let X be a normally distributed RV with mean μ and variance 𝜎2

Then 𝑍 ≡ 𝑔 𝑋 =𝑋−𝜇

𝜎=

1

𝜎𝑋 −

𝜇

𝜎 is also a RV. From the two properties of the expectation

just developed, we know that

𝐸 𝑍 =1

𝜎𝐸 𝑋 −

𝜇

𝜎=1

𝜎𝜇 −

𝜇

𝜎= 0

and

𝑉𝑎𝑟 𝑍 =1

𝜎

2

𝑉𝑎𝑟 𝑋 =1

𝜎2𝜎2 = 1

which confirms what we already know, that the z-score RV Z is a RV having the standard normal distribution density function.

e.g. the RV X represents the daily amount (kilowatt-hours) of electricity used in a plating process. Assume the average usage is 10 kilowatt-hrs per day with a standard deviation of 3 kW-hrs. The cost of electricity is $20/kW-hr. Find the daily average and standard deviation of the cost of electricity.

Let the RV Y denote the cost of electricity. Then 𝑌 = 20𝑋. Therefore 𝐸 𝑌 = 20𝐸 𝑋 = $60, and 𝑉𝑎𝑟 𝑌 = 400𝑉𝑎𝑟 𝑋 = 3,600 → 𝜎𝑌 = $60

Let 𝑋1, 𝑋2, ⋯ , 𝑋𝑘be RVs. Then 𝑌 = 𝑔(𝑋1, 𝑋2, ⋯ , 𝑋𝑘) will also be an RV The expectation (or mean) of Y is: if Y is a discrete RV

𝐸 𝑌 = 𝐸 𝑔(𝑋1, 𝑋2, ⋯ , 𝑋𝑘 ) = ⋯ 𝑔(𝑋1, 𝑋2, ⋯ , 𝑋𝑘)𝑓 𝑥1, 𝑥2, ⋯ , 𝑥𝑘𝑥𝑘𝑥3𝑥2𝑥1

if Y is a continuous RV

𝐸 𝑌 = 𝐸 𝑔(𝑋1, 𝑋2, ⋯ , 𝑋𝑘 )

= ⋯ 𝑔(𝑋1, 𝑋2, ⋯ , 𝑋𝑘)𝑓 𝑥1, 𝑥2, ⋯ , 𝑥𝑘

∞

−∞

𝑑𝑥1 𝑑𝑥2⋯𝑑𝑥𝑘

∞

−∞

∞1

−∞

The covariance As seen, 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 − 𝜇𝑋

2 = 𝐸 𝑋 − 𝜇𝑋 (𝑋 − 𝜇𝑋) Let 𝑋1 and 𝑋2 be two random variables. We define the covariance of 𝑋1 and 𝑋2 by

𝐶𝑜𝑣(𝑋1, 𝑋2) ≡ 𝐸 𝑋1 − 𝜇𝑋1 𝑋2 − 𝜇𝑋2

Thus Var(X) = Cov(X,X)

𝐶𝑜𝑣(𝑋1, 𝑋2) ≡ 𝐸 𝑋1 − 𝜇𝑋1 𝑋2 − 𝜇𝑋2

Recall that 𝑥 − 𝜇 is the deviation from the mean.

Thus the covariance is positive whenever 𝑋1and 𝑋2 both experience positive, or both experience negative, deviations from the mean. If 𝑋1and 𝑋2 experience opposite deviations from the mean, the covariance is negative. Thus the covariance measures the expectation of whether 𝑋1and 𝑋2 both experience the same deviation from the mean.

Recall, that if 𝑋1and 𝑋2 are independent, then 𝑓 𝑥1, 𝑥2 = 𝑓1 𝑥1 ∙ 𝑓2 𝑥2

Then

𝐶𝑜𝑣 𝑋1, 𝑋2 = (𝑥1 − 𝜇1)(𝑥2 − 𝜇2)𝑓 𝑥1, 𝑥2 𝑑𝑥2

∞

−∞

𝑑𝑥1

∞

−∞

= (𝑥1 − 𝜇1)𝑓1 𝑥1

∞

−∞

𝑑𝑥1 (𝑥2 − 𝜇2)𝑓2 𝑥2

∞

−∞

𝑑𝑥2 = 0

Thus when two RVs are independent, their covariance is 0

(However if 𝐶𝑜𝑣 𝑋, 𝑌 = 0,we CANNOT necessarily conclude that X and Y are independent!!)

Let 𝑌 = 𝑎1𝑋1 + 𝑎2𝑋2 where 𝑎1 and 𝑎2 are constants. Then 𝜇𝑌 = 𝐸 𝑌 = 𝐸 𝑎1𝑋1 + 𝑎2𝑋2 =

= 𝑎1𝑥1 + 𝑎2𝑥2 𝑓(𝑥1, 𝑥2) 𝑑𝑥2

∞

−∞

𝑑𝑥1

∞

−∞

= 𝑎1𝑥1 𝑓(𝑥1, 𝑥2) 𝑑𝑥2

∞

−∞

𝑑𝑥1 +∞

−∞

𝑎2𝑥2𝑓(𝑥1, 𝑥2) 𝑑𝑥2

∞

−∞

𝑑𝑥1

∞

−∞

= 𝑎1𝑥1 𝑓(𝑥1, 𝑥2) 𝑑𝑥2

∞

−∞

𝑑𝑥1 +∞

−∞

𝑎2𝑥2 𝑓(𝑥1, 𝑥2) 𝑑𝑥1

∞

−∞

𝑑𝑥2

∞

−∞

= 𝑎1𝑥1𝑓1(𝑥1)𝑑𝑥1 +∞

−∞

𝑎2𝑥2𝑓2(𝑥2)𝑑𝑥2

∞

−∞

= 𝑎1𝐸 𝑋1 + 𝑎2𝐸 𝑋2 Note that this result holds regardless of whether 𝑋1 and 𝑋2 are independent. Thus we have, in general:

Let 𝑋1, 𝑋2, ⋯ , 𝑋𝑘be RVs. Let 𝑋𝑖 have mean value 𝜇𝑖 and variance 𝜎𝑖2. Then the RV

𝑌 = 𝑎1𝑋1 + 𝑎1𝑋1 +⋯+ 𝑎𝑘𝑋𝑘

has mean value

𝐸 𝑌 = 𝑎𝑖𝐸 𝑋𝑖 or

𝑘

𝑖=1

𝜇𝑌 = 𝑎𝑖𝜇𝑖

𝑘

𝑖=1

Let 𝑌 = 𝑎1𝑋1 + 𝑎2𝑋2 where 𝑎1 and 𝑎2 are constants.

𝑉𝑎𝑟 𝑌 = 𝐸 𝑌 − 𝜇𝑌2 = 𝐸 𝑎1𝑋1 + 𝑎2𝑋2 − 𝑎1𝜇1 − 𝑎2𝜇2

2

= 𝐸 𝑎1(𝑋1−𝜇1) + 𝑎2(𝑋2 − 𝜇2)2

= 𝐸 𝑎12 𝑋1 − 𝜇1

2 + 𝑎22 𝑋2 − 𝜇2

2 + 2𝑎1𝑎2(𝑋1−𝜇1) (𝑋2 − 𝜇2)

= 𝑎12𝐸 𝑋1 − 𝜇1

2 + 𝑎22𝐸 𝑋2 − 𝜇2

2 + 2𝑎1𝑎2𝐸[(𝑋1−𝜇1) (𝑋2 − 𝜇2)]

If 𝑋1𝒂𝒏𝒅 𝑋2 are independent RVs then the last term is 0 and

𝑉𝑎𝑟 𝑎1𝑋1 + 𝑎2𝑋2 = 𝑎12𝑉𝑎𝑟 𝑋1 + 𝑎2

2𝑉𝑎𝑟 𝑋2

Let 𝑋1, 𝑋2, ⋯ , 𝑋𝑘be independent RVs. Let 𝑋𝑖 have mean value 𝜇𝑖 and variance 𝜎𝑖2. Then the

RV

𝑌 = 𝑎1𝑋1 + 𝑎1𝑋1 +⋯+ 𝑎𝑘𝑋𝑘 has variance

𝑉𝑎𝑟 𝑌 = 𝑎𝑖2𝑉𝑎𝑟 𝑋𝑖 or

𝑘

𝑖=1

𝜇𝑌2 = 𝑎𝑖

2 𝜎𝑖2

𝑘

𝑖=1

e.g. 𝑋1 has mean 𝜇1 and variance 𝜎12. Similarly for 𝑋2. Find the mean of

a) 𝑋1 − 𝑋2 b) 𝑋1 + 𝑋2

a) 𝐸 𝑋1 − 𝑋2 = 𝜇1 − 𝜇2 b)𝐸 𝑋1 + 𝑋2 = 𝜇1 + 𝜇2

Assuming 𝑋1 and 𝑋2 are independent, find the variance for a) and b)

a) 𝑉𝑎𝑟 𝑋1 − 𝑋2 = 𝜎12 + 𝜎1

2 b) 𝑉𝑎𝑟 𝑋1 − 𝑋2 = 𝜎12 + 𝜎1

2

e.g. 𝑋1 has mean 4 and variance 9; 𝑋2 has mean ─2 and variance 5. c) Find 𝐸(2𝑋1 + 𝑋2 − 5)

c) 𝐸 2𝑋1 + 𝑋2 − 5 = 2𝜇1 + 𝜇2 − 5 = 8 − 2 − 5 = 1

Assuming 𝑋1 and 𝑋2 are independent, find the variance for c)

c) Var 2𝑋1 + 𝑋2 − 5 = 22𝜎12 + 𝜎2

2 = 36 + 5 = 41

e.g. Let 𝑋1, 𝑋2, ⋯ , 𝑋𝑛be RVs. We are going to draw n values (ie. a sample of size n)

𝑥1, 𝑥2, ⋯ , 𝑥𝑛 from the RVs. For each sample, we compute the sample mean 𝑥 =1

𝑛 𝑥𝑖𝑛𝑖=1

Thus we can see that we can define a

sample mean RV 𝑋 =1

𝑛 𝑋𝑖𝑛𝑖=1 having

mean value 𝜇𝑋 =1

𝑛 𝜇𝑖𝑛𝑖=1 .

If the 𝑋𝑖 are independent RVs, then 𝑉𝑎𝑟 𝑋 =1

𝑛

2 𝜎𝑖

2𝑛𝑖=1 .

Therefore, in the case in which 𝑋𝑖 are independent RVs having the same mean 𝜇 and variance 𝜎2, then

𝐸 𝑋 =1

𝑛 𝜇

𝑛

𝑖=1 = μ, and Var 𝑋 =

1

𝑛

2

𝜎2𝑛

𝑖=1=𝜎2

𝑛

Notes:

1) that the variance of 𝑋 is smaller than that of each of the 𝑋𝑖

2) the variance of the sample mean 𝑋 is not the same as the sample variance which was defined as

𝑠2 = 𝑥𝑖 − 𝑥 2𝑛𝑖=1

𝑛 − 1

The sample variance 𝑆2 can be defined as an RV by

𝑆2 = 𝑋𝑖 − 𝑋 2𝑛𝑖=1

𝑛 − 1

Let us assume that the 𝑋𝑖 are independent RVs having the same mean 𝜇 and variance 𝜎2, then the expected value (i.e. mean value) 𝐸(𝑆2) is

𝜇𝑆2 = 𝐸 𝑆2 = 𝜎2

(derivation in text, pages 155-156)

5.11 Moment generating functions

We will skip this material.

5.12 Checking, from a sample of data, whether a RV has a normal distribution

Recall percentiles: e.g. 𝑥𝑖 = 𝑃0.45 is the data value in a sample such that 45% of the (ordered) data values are less than 𝑥𝑖, and 65% of the data values are greater than 𝑥𝑖.

i.e. 𝑃 𝑥𝑗 < 𝑥𝑖 = 0.45 for 𝑖 ≠ 𝑗

equivalently: 𝑃 𝑥𝑗 > 𝑥𝑖 = 0.65 for 𝑖 ≠ 𝑗

Recall the 𝑧𝛼 notation for normally distributed variables: 𝑧𝛼 is that z-score such that

𝑃 𝑧 > 𝑧𝛼 = 𝛼

Thus we see that, for normal distributions, 𝑧𝛼 =𝑃1−𝛼 − 𝜇

𝜎, e.g.:

𝑃0.25 − 𝜇

𝜎= 𝑧0.75,

𝑃0.45 − 𝜇

𝜎= 𝑧0.65,

𝑃0.5 − 𝜇

𝜎= 𝑧0.5,

𝑃0.65 − 𝜇

𝜎= 𝑧0.45,

𝑃0.75 − 𝜇

𝜎= 𝑧0.25, etc

i.e. 𝑃1−𝛼 = 𝜎 𝑧𝛼 + 𝜇 For a normal variable, a plot of 𝑃1−𝛼 vs 𝑧𝛼 is a straight line, slope σ and intercept μ

Let 𝑥1, 𝑥2, … , 𝑥𝑛 be the n data points of our sample (i.e. n values of the RV). Assume the data are ordered, 𝑥𝑗 ≤ 𝑥𝑖 if 𝑗 < 𝑖. Then, if n is sufficiently large,

𝑥𝑘 ≈ 𝑃𝑘/(𝑛+1)

If the data are normally distributed, it must be true that 𝑃𝑘/(𝑛+1) = 𝜎 𝑧1−𝑘/(𝑛+1) + 𝜇

and a plot of 𝑥𝑘 versus 𝑧1−𝑘/(𝑛+1) will give an approximate straight line of slope σ and

intercept μ Such a plot is referred to as a normal scores plot (or normal percentile plot, or normal quartile plot)

e.g. Consider the 4 ordered data values 48, 67, 76, 81. If the data are normal, 48 ≈ 𝑃1

5= 𝜎𝑧0.8 + 𝜇 = 𝜎 −0.84 + 𝜇

67 ≈ 𝑃25= 𝜎𝑧0.6 + 𝜇 = 𝜎 −0.25 + 𝜇

76 ≈ 𝑃35= 𝜎𝑧0.4 + 𝜇 = 𝜎 0.25 + 𝜇

81 ≈ 𝑃35= 𝜎𝑧0.4 + 𝜇 = 𝜎 0.84 + 𝜇

The resulting plot is

𝑦 = 19.5𝑥 + 68

data fit

With only 4 points it is difficult to tell graphically whether the data are normally distributed. If they are, this data would have σ = 19.5 and μ = 68

Normal score plot for a data set of 50 values drawn from an exponential distribution

Normal score plot for a data set of 50 values drawn from a normal distribution

5.14 Generating random values for a continuous RV having a given probability density

Let X be a random variable with probability density 𝑓(𝑥). Compute the cumulative probability distribution 𝐹 𝑥 . Recall 0 ≤ 𝐹(𝑥) ≤ 1. Generate a sequence of values, 𝐹𝑖, using a uniform random value generator on (0,1) For each 𝐹𝑖 generate an 𝑥𝑖 as shown graphically below

1.0

0.5

0.0 x

F(x)

𝐹𝑖

Analytically, 𝑥𝑖 = 𝐹−1(𝐹𝑖)

𝑥𝑖

E.g. Consider the random variable X having the exponential density

𝑓 𝑥 =1

𝛽𝑒−𝑥/𝛽 0 < 𝑥

and cumulative distribution

𝐹 𝑥 = 1 − 𝑒−𝑥/𝛽 i.e.

𝐹𝑖 = 1 − 𝑒−𝑥𝑖/𝛽 solving for 𝑥𝑖 gives

𝑥𝑖 = −𝛽 ln (1 − 𝐹𝑖)

Documents

5.1 Continuous Random Variables - Stony Brooklinli/teaching/ams-310/lecture-notes... · 2013. 3. 26. · A normally distributed random variable has a 5% chance of exceeding a z-score