ECE 302: Chapter 04: Continuous Random Variables...Continuous Random Variable De nition The probability density function (PDF) of a random variable X is a function which, when integrated

c©Stanley Chan 2019. All Rights Reserved.

ECE 302: Chapter 04: Continuous Random Variables

Fall 2019

Prof Stanley Chan

School of Electrical and Computer EngineeringPurdue University

1 / 56


1. Continuous Random Variable

2 / 56


Continuous Random Variable

Sample space becomes continuous

E.g., time, area

Characterized by histogram too!

Not PMF, but Probability Density Function (PDF)

3 / 56


Continuous Random Variable

Definition

The probability density function (PDF) of a random variable X is afunction which, when integrated over an interval [a, b], yields theprobability of obtaining a ≤ X (ξ) ≤ b. We denote PDF of X as fX (x), and

P[a ≤ X ≤ b] =

∫ b

afX (x)dx . (1)

4 / 56


Continuous and discrete unified!

If X is continuous,

P[a ≤ X ≤ b] =

∫ b

afX (x)dx

If X is discrete,

P[a ≤ X ≤ b] = P[X = x0] = pX (x0) =

∫ b

apX (x0)δ(x − x0)︸︷︷︸

fX (x)

dx

5 / 56


Property

A PDF fX (x) should satisfy ∫ ∞−∞

fX (x)dx = 1. (2)

Example. Let fX (x) = c(1− x2) for −1 ≤ x ≤ 1, and 0 otherwise. Find c .

6 / 56


Expectation

Definition (Expectation)

The expectation of a continuous random variable X is

E[X ] =

∫ ∞−∞

x fX (x)dx . (3)

7 / 56


Expectation

Definition (Expectation of Function)

The expectation of a function g of a continuous random variables X is

E[g(X )] =

∫ ∞−∞

g(x) fX (x)dx . (4)

Definition (Moment)

The kth moment of a continuous random variables X is

E[X k ] =

∫ ∞−∞

xk fX (x)dx . (5)

8 / 56


Variance

Definition (Variance)

The variance of a continuous random variables X is

Var[X ] = E[(X − µX )2]

=

∫ ∞−∞

(x − µX )2fX (x)dx

where µXdef= E[X ].

Remark: It also holds that

Var[X ] = E[X 2]− E[X ]2.

9 / 56


2. Common Continuous Random Variables

10 / 56


Uniform Distribution

Definition (Uniform Distribution)

Let X be a continuous uniform random variable. The PDF of X is

fX (x) =

{1

b−a , a ≤ x ≤ b,

0, otherwise,(6)

where [a, b] is the interval on which X is defined. We write

X ∼ Uniform(a, b)

to say that X is drawn from a uniform distribution on an interval [a, b].

11 / 56


Mean and Variance

Proposition (Mean/Variance of Uniform Distribution)

If X ∼ Uniform(a, b), then

E[X ] =a + b

2, and Var[X ] =

(b − a)2

12.

12 / 56


Application of Uniform Distribution

Analysis of Uniform QuantizerAssumption: X [n] is random signal.Quantization: partition the amplitude of X [n] into a discrete set of levels.

13 / 56



We can model the quantization error as uniform distribution.

Or if we let the ∆ be the height of the quantization interval, then

Eq[n] ∼ Uniform

[−∆

2,

∆

2

].

The mean and variance of Eq[n] is

E[Eq[n]] = 0, Var[Eq[n]] =∆2

12.

14 / 56



Knowing the distribution of Eq[n] is important:

It helps us design error compensation algorithms

It helps us understand the limit of data compression

It helps us generalize the concept to more advanced coding schemesR. Gray, Source Coding Theory, Kluwer Academic Publishers, 1990.

15 / 56


Exponential distribution

Definition (Exponential Distribution)

Let X be an exponential random variable. The PDF of X is

fX (x) =

{λe−λx , x ≥ 0,

0, otherwise,(7)

where λ > 0 is a parameter. We write

X ∼ Exponential(λ)

to say that X is drawn from an exponential distribution of parameter λ.

Example. Inter-arrival time of Poisson random variables

16 / 56


Effect of λ

Proposition (Mean/Variance of Exponential Distribution)

If X ∼ Exponential(λ), then

E[X ] =1

λ, and Var[X ] =

1

λ2.

17 / 56


Neighbor of Exponential Distribution

A closely related distribution to Exponential distribution is the Laplacedistribution:

fX (x) = λe−λ|x |

Example: Image statistics.

18 / 56


Neighbor of Exponential Distribution

• Instead of looking at the image intensity I directly, we can look at the

gradient of the image:

[∇x I∇y I

].

• Image gradients are sparse.

19 / 56


3. Cumulative Distribution Function

20 / 56


Cumulative Distribution Function

Definition

The cumulative distribution function (CDF) of a continuous randomvariable X is

FX (x)def= P[X ≤ x ] =

∫ x

−∞fX (x ′)dx ′. (8)

Example. Let fX (x) = c(1− x2) for −1 ≤ x ≤ 1, and 0 otherwise. FindFX (x).

21 / 56


Properties of CDF

1 FX (−∞) =

2 FX (+∞) =

3 FX (x) is a non-decreasing function of x .

4 0 ≤ FX (x) ≤ 1

5 P[a ≤ X ≤ b] =

22 / 56


Properties of CDF

Before we discuss Properties 6-7, we need the following terms.

(i) FX (b): The value of FX (x) at x = b.

(ii) limh→0 FX (b − h): The limit of FX (x) from the left hand side ofx = b.

(iii) limh→0 FX (b + h): The limit of FX (x) from the right hand side ofx = b.

23 / 56


Properties of CDF

We say that FX (x) is

Left-continuous at x = b if

Right-continuous at x = b if

Continuous at x = b if

24 / 56


Properties of CDF

6 FX (x) is right-continuous. That is,

limh→0

FX (b + h) = FX (b).

7 P[X = b] is determined by

P[X = b] = FX (b)− limh→0

FX (b − h).

25 / 56


Theorem (Fundamental theorem of calculus)

If a function f is continuous, then

f (x) =d

dx

∫ x

af (t)dt

for some constant a.

Theorem

The probability density function (PDF) is the derivative of thecumulative distribution function (CDF):

fX (x) =dFX (x)

dx=

d

dx

∫ x

−∞fX (x ′)dx ′, (9)

provided FX is differentiable at x .

26 / 56


Example. Consider a CDF

FX (x) =

{1− 1

4e−2x , x ≥ 0

0, x < 0.

Find fX (x).

27 / 56


Example. Consider a CDF

FX (x) =

0.2, 0 ≤ x < 1

0.7, 1 ≤ x < 2

0.9, 2 ≤ x < 4

1, x ≥ 4.

Find fX (x).

28 / 56


Mean / Mode / Median

Given a random variable X , can we define its mean/mode/median?From PDF:

Mean:

Mode:

Median:

29 / 56


Mean / Mode / Median

From CDF:

Mean:

E[X ] =

∫ ∞0

(1− FX (x ′)

)dx ′ −

∫ 0

−∞FX (x ′)dx ′. (10)

Mode:

Median:

30 / 56


Application of CDF

Q-Q Plot - a tool to check how good your model is.

Example Consider a dataset containing N data points. The histogram(empirical PDF) and empirical CDF is as follows:

Is it a Gaussian distribution?31 / 56


QQ-Plot

32 / 56


QQ-Plot

Why does it work?

Assume x1, . . . , xN are samples of a random variable X .Hypothesis: These data points are generated from certain randomvariable X̂ . Let F

X̂be its CDF.

Consider y1, . . . , yN are the equally spaced points of FX̂

. Then the zi ’s are

zi = F−1X̂

(yi ).

Testing: If X = X̂ , then for large N, we must have

zi = F−1X̂

(yi ) ≈ xi .

Therefore, we should have a linear function if we plot xi against zi .

33 / 56


QQ-Plot

Figure: Left: Poor fit. In fact, the empirical data is generated from at-distribution. Right: Good fit.

34 / 56


4. Gaussian Distribution

35 / 56


Gaussian Distribution

Definition (Gaussian Distribution)

Let X be an Gaussian random variable. The PDF of X is

fX (x) =1√

2πσ2e−

(x−µ)2

2σ2 (11)

where (µ, σ2) are parameters of the distribution. We write

X ∼ N (µ, σ2)

to say that X is drawn from a Gaussian distribution of parameter (µ, σ2).

36 / 56



Figure: Gaussian distribution

Proposition (Mean/Variance of Gaussian Distribution)

If X ∼ N (µ, σ2), then

E[X ] = µ, and Var[X ] = σ2.

37 / 56



Proof.

38 / 56


Percentile of Gaussian Distribution

39 / 56


Standard Gaussian

Definition (Standard Gaussian)

A standard Gaussian (or standard Normal) random variable X has a PDF

fX (x) =1√2π

e−x2

2 . (12)

That is, X ∼ N (0, 1) is a Gaussian with µ = 0 and σ2 = 1.

Definition (CDF of Standard Gaussian)

The Φ(·) function of the standard Gaussian is

Φ(z) =1√2π

∫ z

−∞e−

x2

2 dx (13)

40 / 56


Standardize Random Variable

If X ∼ N (µ, σ2), then

Z =X − µσ

∼ N (0, 1).

Proof. Key: Change of variable.

FX (x) =

∫ x

−∞fX (x ′)dx ′

=

∫ x

−∞

1√2πσ2

e−(x′−µ)2

2σ2 dx ′

=

∫ x−µσ

−∞

1√2π

e−x′22 dx ′

= Φ

(x − µσ

).

41 / 56


Standard Gaussian

Figure: Definition of Φ(y).

Example. Let X ∼ N (µ, σ2). Find P[X ≤ b] and P[a ≤ X ≤ b].

42 / 56


Standard Gaussian

Example. X ∼ N (5, 16), find

(a) P[X > 3]

(b) If P[X < a] = 0.7910, find a.

(c) If P[X > b] = 0.1635, find b.

43 / 56


Example: Find the Outlier!

Find the outlier of this set of data:[0.25, 0.31, 0.33, 0.32, 0.36, 0.28, 0.29, 0.26, 0.7, 0.34].

Compute the statistics.

µ = 0.344, σ = 0.129.

Standarize Z = (X − µ)/σ.

The z-values are:-0.72, -0.26, -0.10, -0.18, 0.12, -0.49, -0.41, -0.64, 2.74, -0.03.

The probabilities P[Z < z ] are:0.23, 0.39, 0.45, 0.42, 0.54, 0.31, 0.33, 0.25, 0.9969, 0.48.

44 / 56


Linear Transform of Gaussian

If X is Gaussian, and if we let

Y = aX + b,

then Y is also Gaussian.

Why?Assume X ∼ N (0, 1). Otherwise, standardize Z = (X − µ)/σ.

FY (y) = P[Y ≤ y ]

= P[aX + b ≤ y ]

= P[X ≤ (y − b)/a]

=

∫ (y−b)/a

−∞

1√2π

e−x2

2 dx .

45 / 56


Linear Transform of Gaussian

Therefore, by Fundamental Theorem of Calculus,

fY (y) =d

dyFY (y)

=d

dy

∫ (y−b)/a

−∞

1√2π

e−x2

2 dx

=d y−b

a

dy· d

d y−ba

∫ (y−b)/a

−∞

1√2π

e−x2

2 dx (chain rule)

=1

a· 1√

2πe−

((y−b)/a)2

2 =1√

2πa2e−

(y−b)2

2a2 .

So Y is also Gaussian, with mean E[Y ] = b and Var[Y ] = a2.

In General: If X is Gaussian but not N (0, 1), then

E[Y ] = aE[X ] + b, Var[Y ] = a2Var[X ].

46 / 56


Detection

Problem: Consider two clusters of data points.You want to build a simple classifier to determine whether a point belongsto N (µ1, σ

21) or N (µ2, σ

22).

Solution: Given the data point x , check whether one probability is largerthan the other! 47 / 56


Detection

Write down the two PDFs:

1√2πσ21

e− (x−µ1)

2

2σ21 ≷

1√2πσ22

e− (x−µ2)

2

2σ22

Simplified Case: When σ1 = σ2 = σ. Then,

e−(x−µ1)

2

2σ2 ≷ e−(x−µ2)

2

2σ2

−(x − µ1)2

2σ2≷ −(x − µ2)2

2σ2

(x − µ1)2 ≶ (x − µ2)2

x2 − 2µ1x + µ21 ≶ x2 − 2µ2x + µ22

x ≶µ1 + µ2

2.

Therefore, if x < µ1+µ22 , then it is more likely that it belongs to class 1.

Otherwise, it is more likely that it belongs to class 2.48 / 56


5. Function of Random Variable

49 / 56


Function of Random Variable

Problem:

Given X .

Let Y = g(X ).

Want to find fY (y) and FY (y).

Example 1. Let X ∼ Uniform(0, 1). Let Y = 2X + 3. Find fY (y).

Example 2. Let X ∼ N (0, 1). Let Y = X 2. Find fY (y).

Why should we care about this?

Needed by problem. E.g., power and voltage: P = V 2/R.

Needed by analysis. E.g., random phase cos(ωt + Θ).

Needed by design. E.g., variance stabilizing transform.

50 / 56


Examples

Example 1. Let X ∼ N (0, 1). Let Y = 2X + 3. Find fY (y) and FY (y).

51 / 56


Examples

Example 2. Let X ∼ Uniform(−1, 1). Suppose Y = X 2. Find fY (y) andFY (y).

52 / 56


Examples

Example 3. Let X ∼ Uniform(0, 2π). Suppose Y = cosX . Find fY (y)and FY (y). Hint: d

dy cos−1 y = −1√1−y2

.

53 / 56


General Procedure

As shown in the previous examples, the basic steps are

FY (y) = P[Y ≤ y ]

P[Y ≤ y ] = P[g(X ) ≤ y ] = P[X ≤ g−1(y)], if g is increasing.Otherwise, pay attention to the inequality sign.

P[x ≤ g−1(y)] = FX (g−1(y)).

fY (y) = ddy FY (y) = d

dy FX (g−1(y))

Fundamental theorem of calculus is useful here:

d

dyFX (g−1(y)) =

d

dy

∫ g−1(y)

−∞fX (x ′)dx ′.

Chain rule:

d

dy

∫ g−1(y)

−∞fX (x ′)dx ′ =

dg−1(y)

dy· d

dg−1(y)

∫ g−1(y)

−∞fX (x ′)dx ′.

54 / 56


Why Study Function of Random Variable?

Variance Stabilizing TransformMost of the denoising algorithms are

Designed for Gaussian noise

Assume variance is constant throughout the image

Easy to analyze, easy to implement

But, most photon shot noise is

Poisson

If X ∼ Poisson(λ), then E[X ] = λ and Var[X ] = λ

Variance changes as pixel intensity changes.

Variance stabilizing transform:

Let Y =√X + 3/8

Var[Y ] ≈ 1/4, constant throughout the image

Anscombe, F. J. (1948), “The transformation of Poisson, binomial and negative-binomial data”, Biometrika, 35 (34), pp.246254.

55 / 56


Variance Stabilizing Transform

X , noisy input Var[X ] (before) Var[Y ] (after)

noisy input direct denoise transform-denoise56 / 56

Documents

ECE 302: Chapter 04: Continuous Random Variables...Continuous Random Variable De nition The probability density function (PDF) of a random variable X is a function which, when integrated