Transcript
Page 1: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Lecture Notes 1

Probability and Random Variables

• Probability Spaces

• Conditional Probability and Independence

• Random Variables

• Functions of a Random Variable

• Generation of a Random Variable

• Jointly Distributed Random Variables

• Scalar detection

EE 278B: Probability and Random Variables 1 – 1

Probability Theory

• Probability theory provides the mathematical rules for assigning probabilities tooutcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage

• Basic elements of probability theory:

Sample space Ω: set of all possible “elementary” or “finest grain” outcomesof the random experiment

Set of events F : set of (all?) subsets of Ω—an event A ⊂ Ω occurs if theoutcome ω ∈ A

Probability measure P: function over F that assigns probabilities to eventsaccording to the axioms of probability (see below)

• Formally, a probability space is the triple (Ω,F ,P)

EE 278B: Probability and Random Variables 1 – 2

Page 2: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Axioms of Probability

• A probability measure P satisfies the following axioms:

1. P(A) ≥ 0 for every event A in F2. P(Ω) = 1

3. If A1, A2, . . . are disjoint events—i.e., Ai ∩Aj = ∅, for all i 6= j —then

P

( ∞⋃

i=1

Ai

)

=

∞∑

i=1

P(Ai)

• Notes:

P is a measure in the same sense as mass, length, area, and volume—allsatisfy axioms 1 and 3

Unlike these other measures, P is bounded by 1 (axiom 2)

This analogy provides some intuition but is not sufficient to fully understandprobability theory—other aspects such as conditioning and independence areunique to probability

EE 278B: Probability and Random Variables 1 – 3

Discrete Probability Spaces

• A sample space Ω is said to be discrete if it is countable

• Examples:

Rolling a die: Ω = 1, 2, 3, 4, 5, 6 Flipping a coin n times: Ω = H, Tn, sequences of heads/tails of length n

Flipping a coin until the first heads occurs: Ω = H,TH, TTH, TTTH, . . .

• For discrete sample spaces, the set of events F can be taken to be the set of allsubsets of Ω, sometimes called the power set of Ω

• Example: For the coin flipping experiment,

F = ∅, H, T, Ω

• F does not have to be the entire power set (more on this later)

EE 278B: Probability and Random Variables 1 – 4

Page 3: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

• The probability measure P can be defined by assigning probabilities to individualoutcomes— single outcome events ω—so that:

P(ω) ≥ 0 for every ω ∈ Ω

ω∈Ω

P(ω) = 1

• The probability of any other event A is simply

P(A) =∑

ω∈A

P(ω)

• Example: For the die rolling experiment, assign

P(i) = 16 for i = 1, 2, . . . , 6

The probability of the event “the outcome is even,” A = 2, 4, 6, isP(A) = P(2) + P(4) + P(6) = 3

6 = 12

EE 278B: Probability and Random Variables 1 – 5

Continuous Probability Spaces

• A continuous sample space Ω has an uncountable number of elements

• Examples:

Random number between 0 and 1: Ω = (0, 1 ]

Point in the unit disk: Ω = (x, y) : x2 + y2 ≤ 1 Arrival times of n packets: Ω = (0,∞)n

• For continuous Ω, we cannot in general define the probability measure P by firstassigning probabilities to outcomes

• To see why, consider assigning a uniform probability measure over (0, 1 ]

In this case the probability of each single outcome event is zero

How do we find the probability of an event such as A = [0.25, 0.75]?

EE 278B: Probability and Random Variables 1 – 6

Page 4: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

• Another difference for continuous Ω: we cannot take the set of events F as thepower set of Ω. (To learn why you need to study measure theory, which isbeyond the scope of this course)

• The set of events F cannot be an arbitrary collection of subsets of Ω. It mustmake sense, e.g., if A is an event, then its complement Ac must also be anevent, the union of two events must be an event, and so on

• Formally, F must be a sigma algebra (σ-algebra, σ-field), which satisfies thefollowing axioms:

1. ∅ ∈ F2. If A ∈ F then Ac ∈ F3. If A1, A2, . . . ∈ F then

⋃∞i=1Ai ∈ F

• Of course, the power set is a sigma algebra. But we can define smallerσ-algebras. For example, for rolling a die, we could define the set of events as

F = ∅, odd, even, Ω

EE 278B: Probability and Random Variables 1 – 7

• For Ω = R = (−∞,∞) (or (0,∞), (0, 1), etc.) F is typically defined as thefamily of sets obtained by starting from the intervals and taking countableunions, intersections, and complements

• The resulting F is called the Borel field

• Note: Amazingly there are subsets in R that cannot be generated in this way!(Not ones that you are likely to encounter in your life as an engineer or even asa mathematician)

• To define a probability measure over a Borel field, we first assign probabilities tothe intervals in a consistent way, i.e., in a way that satisfies the axioms ofprobability

For example to define uniform probability measure over (0, 1), we first assignP((a, b)) = b− a to all intervals

• In EE 278 we do not deal with sigma fields or the Borel field beyond (kind of)knowing what they are

EE 278B: Probability and Random Variables 1 – 8

Page 5: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Useful Probability Laws

• Union of Events Bound :

P(

n⋃

i=1

Ai

)

≤n∑

i=1

P(Ai)

• Law of Total Probability : Let A1, A2, A3, . . . be events that partition Ω, i.e.,disjoint (Ai ∩Aj = ∅ for i 6= j) and

iAi = Ω. Then for any event B

P(B) =∑

iP(Ai ∩B)

The Law of Total Probability is very useful for finding probabilities of sets

EE 278B: Probability and Random Variables 1 – 9

Conditional Probability

• Let B be an event such that P(B) 6= 0. The conditional probability of event Agiven B is defined to be

P(A |B) =P(A ∩B)

P(B)=

P(A,B)

P(B)

• The function P(· |B) is a probability measure over F , i.e., it satisfies theaxioms of probability

• Chain rule: P(A,B) = P(A)P(B |A) = P(B)P(A |B) (this can be generalizedto n events)

• The probability of event A given B , a nonzero probability event— thea posteriori probability of A—is related to the unconditional probability ofA—the a priori probability—by

P(A |B) =P(B |A)P(B)

P(A)

This follows directly from the definition of conditional probability

EE 278B: Probability and Random Variables 1 – 10

Page 6: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Bayes Rule

• Let A1, A2, . . . , An be nonzero probability events that partition Ω, and let B bea nonzero probability event

• We know P(Ai) and P(B |Ai), i = 1, 2, . . . , n, and want to find the a posterioriprobabilities P(Aj |B), j = 1, 2, . . . , n

• We know that

P(Aj |B) =P(B |Aj)

P(B)P(Aj)

• By the law of total probability

P(B) =

n∑

i=1

P(Ai, B) =

n∑

i=1

P(Ai)P(B |Ai)

• Substituting, we obtain Bayes rule

P(Aj |B) =P(B |Aj)

∑ni=1P(Ai)P(B |Ai)

P(Aj), j = 1, 2, . . . , n

• Bayes rule also applies to a (countably) infinite number of events

EE 278B: Probability and Random Variables 1 – 11

Independence

• Two events are said to be statistically independent if

P(A,B) = P(A)P(B)

• When P(B) 6= 0, this is equivalent to

P(A |B) = P(A)

In other words, knowing whether B occurs does not change the probability of A

• The events A1, A2, . . . , An are said to be independent if for every subsetAi1, Ai2, . . . , Aik of the events,

P(Ai1, Ai2, . . . , Aik) =k∏

j=1

P(Aij)

• Note: P(A1, A2, . . . , An) =∏n

j=1P(Ai) is not sufficient for independence

EE 278B: Probability and Random Variables 1 – 12

Page 7: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Random Variables

• A random variable (r.v.) is a real-valued function X(ω) over a sample space Ω,i.e., X : Ω → R

Ω

ω X(ω)

• Notations:

We use upper case letters for random variables: X, Y, Z, Φ, Θ, . . .

We use lower case letters for values of random variables: X = x means thatrandom variable X takes on the value x, i.e., X(ω) = x where ω is theoutcome

EE 278B: Probability and Random Variables 1 – 13

Specifying a Random Variable

• Specifying a random variable means being able to determine the probability thatX ∈ A for any Borel set A ⊂ R, in particular, for any interval (a, b ]

• To do so, consider the inverse image of A under X , i.e., ω : X(ω) ∈ A

R

set A

inverse image of A under X(ω), i.e., ω : X(ω) ∈ A

• Since X ∈ A iff ω ∈ ω : X(ω) ∈ A,P(X ∈ A) = P(ω : X(ω) ∈ A) = Pω : X(ω) ∈ A

Shorthand: P(set description) = Pset description

EE 278B: Probability and Random Variables 1 – 14

Page 8: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Cumulative Distribution Function (CDF)

• We need to be able to determine PX ∈ A for any Borel set A ⊂ R, i.e., anyset generated by starting from intervals and taking countable unions,intersections, and complements

• Hence, it suffices to specify PX ∈ (a, b ] for all intervals. The probability ofany other Borel set can be determined by the axioms of probability

• Equivalently, it suffices to specify its cumulative distribution function (cdf):

FX(x) = PX ≤ x = PX ∈ (−∞, x ] , x ∈ R

• Properties of cdf:

FX(x) ≥ 0

FX(x) is monotonically nondecreasing, i.e., if a > b then FX(a) ≥ FX(b)

FX(x)

1

x

EE 278B: Probability and Random Variables 1 – 15

Limits: limx→+∞

FX(x) = 1 and limx→−∞

FX(x) = 0

FX(x) is right continuous, i.e., FX(a+) = limx→a+ FX(x) = FX(a)

PX = a = FX(a)− FX(a−), where FX(a−) = limx→a− FX(x)

For any Borel set A, PX ∈ A can be determined from FX(x)

• Notation: X ∼ FX(x) means that X has cdf FX(x)

EE 278B: Probability and Random Variables 1 – 16

Page 9: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Probability Mass Function (PMF)

• A random variable is said to be discrete if FX(x) consists only of steps over acountable set X

PSfrag

1

x

• Hence, a discrete random variable can be completely specified by the probability

mass function (pmf)

pX(x) = PX = x for every x ∈ XClearly pX(x) ≥ 0 and

x∈X pX(x) = 1

• Notation: We use X ∼ pX(x) or simply X ∼ p(x) to mean that the discreterandom variable X has pmf pX(x) or p(x)

EE 278B: Probability and Random Variables 1 – 17

• Famous discrete random variables:

Bernoulli : X ∼ Bern(p) for 0 ≤ p ≤ 1 has the pmf

pX(1) = p and pX(0) = 1− p

Geometric : X ∼ Geom(p) for 0 ≤ p ≤ 1 has the pmf

pX(k) = p(1− p)k−1 , k = 1, 2, 3, . . .

Binomial : X ∼ Binom(n, p) for integer n > 0 and 0 ≤ p ≤ 1 has the pmf

pX(k) =

(

n

k

)

pk(1− p)n−k, k = 0, 1, 2, . . .

Poisson: X ∼ Poisson(λ) for λ > 0 has the pmf

pX(k) =λk

k!e−λ , k = 0, 1, 2, . . .

Remark: Poisson is the limit of Binomial for np = λ as n → ∞, i.e., for everyk = 0, 1, 2, . . ., the Binom(n, λ/n) pmf

pX(k) → λk

k!e−λ as n → ∞

EE 278B: Probability and Random Variables 1 – 18

Page 10: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Probability Density Function (PDF)

• A random variable is said to be continuous if its cdf is a continuous function

PSfrag

1

x

• If FX(x) is continuous and differentiable (except possibly over a countable set),then X can be completely specified by a probability density function (pdf)fX(x) such that

FX(x) =

∫ x

−∞fX(u) du

• If FX(x) is differentiable everywhere, then (by definition of derivative)

fX(x) =dFX(x)

dx

= lim∆x→0

F (x + ∆x) − F (x)

∆x= lim

∆x→0

Px < X ≤ x + ∆x∆x

EE 278B: Probability and Random Variables 1 – 19

• Properties of pdf:

fX(x) ≥ 0

∫ ∞

−∞fX(x) dx = 1

For any event (Borel set) A ⊂ R,

PX ∈ A =

x∈A

fX(x) dx

In particular,

Px1 < X ≤ x2 =

∫ x2

x1

fX(x) dx

• Important note: fX(x) should not be interpreted as the probability that X = x.In fact, fX(x) is not a probability measure since it can be > 1

• Notation: X ∼ fX(x) means that X has pdf fX(x)

EE 278B: Probability and Random Variables 1 – 20

Page 11: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

• Famous continuous random variables:

Uniform: X ∼ U[a, b ] where a < b has pdf

fX(x) =

1b−a if a ≤ x ≤ b

0 otherwise

Exponential : X ∼ Exp(λ) where λ > 0 has pdf

fX(x) =

λe−λx if x ≥ 0

0 otherwise

Laplace: X ∼ Laplace(λ) where λ > 0 has pdf

fX(x) =1

2λe−λ|x|

Gaussian: X ∼ N (µ, σ2) with parameters µ (the mean) and σ2 (thevariance, σ is the standard deviation) has pdf

fX(x) =1√2πσ2

e−(x−µ)2

2σ2

EE 278B: Probability and Random Variables 1 – 21

The cdf of the standard normal random variable N (0, 1) is

Φ(x) =

∫ x

−∞

1√2π

e−u2

2 du

Define the function Q(x) = 1− Φ(x) = PX > xN (0, 1)

x

Q(x)

The Q(·) function is used to compute PX > a for any Gaussian r.v. X :

Given Y ∼ N (µ, σ2), we represent it using the standard X ∼ N (0, 1) as

Y = σX + µ

Then

PY > y = P

X >y − µ

σ

= Q

(

y − µ

σ

)

The complementary error function is erfc(x) = 2Q(√2x)

EE 278B: Probability and Random Variables 1 – 22

Page 12: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Functions of a Random Variable

• Suppose we are given a r.v. X with known cdf FX(x) and a function y = g(x).What is the cdf of the random variable Y = g(X)?

• We useFY (y) = PY ≤ y = Px : g(x) ≤ y

x y

y

x : g(x) ≤ y

EE 278B: Probability and Random Variables 1 – 23

• Example: Quadratic function. Let X ∼ FX(x) and Y = X2. We wish to findFY (y)

y

x

y

√y−√

y

If y < 0, then clearly FY (y) = 0. Consider y ≥ 0,

FY (y) = P −√y < X ≤ √

y = FX (√y)− FX (−√

y )

If X is continuous with density fX(x), then

fY (y) =1

2√y

(

fX(+√y) + fX(−√

y))

EE 278B: Probability and Random Variables 1 – 24

Page 13: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

• Remark: In general, let X ∼ fX(x) and Y = g(X) be differentiable. Then

fY (y) =k∑

i=1

fX(xi)

|g′(xi)|,

where x1, x2, . . . are the solutions of the equation y = g(x) and g′(xi) is thederivative of g evaluated at xi

EE 278B: Probability and Random Variables 1 – 25

• Example: Limiter. Let X ∼ Laplace(1), i.e., fX(x) = (1/2)e−|x|, and let Y bedefined by the function of X shown in the figure. Find the cdf of Y

y

x+1

−1

+a

−a

To find the cdf of Y , we consider the following cases

y < −a: Here clearly FY (y) = 0

y = −a: Here

FY (−a) = FX(−1)

=

∫ −1

−∞12e

x dx = 12e

−1

EE 278B: Probability and Random Variables 1 – 26

Page 14: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

−a < y < a: Here

FY (y) = PY ≤ y= P aX ≤ y

= P

X ≤ y

a

= FX

(y

a

)

= 12e

−1 +

∫ y/a

−1

12e

−|x| dx

y ≥ a: Here FY (y) = 1

Combining the results, the following is a sketch of the cdf of Y

FY (y)

ya−a

EE 278B: Probability and Random Variables 1 – 27

Generation of Random Variables

• Generating a r.v. with a prescribed distribution is often needed for performingsimulations involving random phenomena, e.g., noise or random arrivals

• First let X ∼ F (x) where the cdf F (x) is continuous and strictly increasing.Define Y = F (X), a real-valued random variable that is a function of X

What is the cdf of Y ?

Clearly, FY (y) = 0 for y < 0, and FY (y) = 1 for y > 1

For 0 ≤ y ≤ 1, note that by assumption F has an inverse F−1, so

FY (y) = PY ≤ y = PF (X) ≤ y = PX ≤ F−1(y) = F (F−1(y)) = y

Thus Y ∼ U [ 0, 1 ], i.e., Y is a uniformly distributed random variable

• Note: F (x) does not need to be invertible. If F (x) = a is constant over someinterval, then the probability that X lies in this interval is zero. Without loss ofgenerality, we can take F−1(a) to be the leftmost point of the interval

• Conclusion: We can generate a U[ 0, 1 ] r.v. from any continuous r.v.

EE 278B: Probability and Random Variables 1 – 28

Page 15: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

• Now, let’s consider the more useful scenario where we are given X ∼ U[ 0, 1 ] (arandom number generator) and wish to generate a random variable Y withprescribed cdf F (y), e.g., Gaussian or exponential

x = F (y)

y = F−1(x)

1

• If F is continuous and strictly increasing, set Y = F−1(X). To show Y ∼ F (y),

FY (y) = PY ≤ y= PF−1(X) ≤ y= PX ≤ F (y)= F (y) ,

since X ∼ U[ 0, 1 ] and 0 ≤ F (y) ≤ 1

EE 278B: Probability and Random Variables 1 – 29

• Example: To generate Y ∼ Exp(λ), set

Y = −1

λln(1−X)

• Note: F does not need to be continuous for the above to work. For example, togenerate Y ∼ Bern(p), we set

Y =

0 X ≤ 1− p

1 otherwise

x = F (y)

y

1−p

10

• Conclusion: We can generate a r.v. with any desired distribution from a U[0, 1] r.v.

EE 278B: Probability and Random Variables 1 – 30

Page 16: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Jointly Distributed Random Variables

• A pair of random variables defined over the same probability space are specifiedby their joint cdf

FX,Y (x, y) = PX ≤ x, Y ≤ y , x, y ∈ R

FX,Y (x, y) is the probability of the shaded region of R2

x

y

(x, y)

EE 278B: Probability and Random Variables 1 – 31

• Properties of the cdf:

FX,Y (x, y) ≥ 0

If x1 ≤ x2 and y1 ≤ y2 then FX,Y (x1, y1) ≤ FX,Y (x2, y2)

limy→−∞

FX,Y (x, y) = 0 and limx→−∞

FX,Y (x, y) = 0

limy→∞

FX,Y (x, y) = FX(x) and limx→∞

FX,Y (x, y) = FY (y)

FX(x) and FY (y) are the marginal cdfs of X and Y

limx,y→∞

FX,Y (x, y) = 1

• X and Y are independent if for every x and y

FX,Y (x, y) = FX(x)FY (y)

EE 278B: Probability and Random Variables 1 – 32

Page 17: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Joint, Marginal, and Conditional PMFs

• Let X and Y be discrete random variables on the same probability space

• They are completely specified by their joint pmf :

pX,Y (x, y) = PX = x, Y = y , x ∈ X , y ∈ Y

By axioms of probability,∑

x∈X

y∈YpX,Y (x, y) = 1

• To find pX(x), the marginal pmf of X , we use the law of total probability

pX(x) =∑

y∈Yp(x, y) , x ∈ X

• The conditional pmf of X given Y = y is defined as

pX|Y (x|y) =pX,Y (x, y)

pY (y), pY (y) 6= 0, x ∈ X

• Chain rule: pX,Y (x, y) = pX(x)pY |X(y|x) = pY (y)pX|Y (x|y)

EE 278B: Probability and Random Variables 1 – 33

• Independence: X and Y are said to be independent if for every (x, y) ∈ X × Y ,

pX,Y (x, y) = pX(x)pY (y) ,

which is equivalent to pX|Y (x|y) = pX(x) for every x ∈ X and y ∈ Y suchthat pY (y) 6= 0

EE 278B: Probability and Random Variables 1 – 34

Page 18: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Joint, Marginal, and Conditional PDF

• X and Y are jointly continuous random variables if their joint cdf is continuousin both x and y

In this case, we can define their joint pdf, provided that it exists, as the functionfX,Y (x, y) such that

FX,Y (x, y) =

∫ x

−∞

∫ y

−∞fX,Y (u, v) du dv , x, y ∈ R

• If FX,Y (x, y) is differentiable in x and y, then

fX,Y (x, y) =∂2F (x, y)

∂x∂y= lim

∆x,∆y→0

Px < X ≤ x+∆x, y < Y ≤ y +∆y∆x∆y

• Properties of fX,Y (x, y):

fX,Y (x, y) ≥ 0

∫ ∞

−∞

∫ ∞

−∞fX,Y (x, y) dx dy = 1

EE 278B: Probability and Random Variables 1 – 35

• The marginal pdf of X can be obtained from the joint pdf via the law of totalprobability:

fX(x) =

∫ ∞

−∞fX,Y (x, y) dy

• X and Y are independent iff fX,Y (x, y) = fX(x)fY (y) for every x, y

• Conditional cdf and pdf: Let X and Y be continuous random variables withjoint pdf fX,Y (x, y). We wish to define FY |X(y |X = x) = PY ≤ y |X = xWe cannot define the above conditional probability as

PY ≤ y, X = xPX = x

because both numerator and denominator are equal to zero. Instead, we defineconditional probability for continuous random variables as a limit

FY |X(y|x) = lim∆x→0

PY ≤ y |x < X ≤ x+∆x

= lim∆x→0

PY ≤ y, x < X ≤ x+∆xPx < X ≤ x+∆x

= lim∆x→0

∫ y

−∞ fX,Y (x, u) du∆x

fX(x)∆x=

∫ y

−∞

fX,Y (x, u)

fX(x)du

EE 278B: Probability and Random Variables 1 – 36

Page 19: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

• We then define the conditional pdf in the usual way as

fY |X(y|x) = fX,Y (x, y)

fX(x)if fX(x) 6= 0

• Thus

FY |X(y|x) =∫ y

−∞fY |X(u|x) du

which shows that fY |X(y|x) is a pdf for Y given X = x, i.e.,

Y | X = x ∼ fY |X(y|x)

• independence: X and Y are independent if fX,Y (x, y) = fX(x)fY (y) for every(x, y)

EE 278B: Probability and Random Variables 1 – 37

Mixed Random Variables

• Let Θ be a discrete random variable with pmf pΘ(θ)

• For each Θ = θ with pΘ(θ) 6= 0, let Y be a continuous random variable, i.e.,FY |Θ(y|θ) is continuous for all θ. We define fY |Θ(y|θ) in the usual way

• The conditional pmf of Θ given y can be defined as a limit

pΘ|Y (θ | y) = lim∆y→0

PΘ = θ, y < Y ≤ y +∆yPy < Y ≤ y +∆y

= lim∆y→0

pΘ(θ)fY |Θ(y|θ)∆y

fY (y)∆y=

fY |Θ(y|θ)pΘ(θ)fY (y)

EE 278B: Probability and Random Variables 1 – 38

Page 20: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Bayes Rule for Random Variables

• Bayes Rule for pmfs: Given pX(x) and pY |X(y|x), then

pX|Y (x|y) =pY |X(y|x)

x′∈XpY |X(y|x′)pX(x′)

pX(x)

• Bayes rule for densities: Given fX(x) and fY |X(y|x), then

fX|Y (x|y) =fY |X(y|x)

∫∞−∞ fX(u)fY |X(y|u) du fX(x)

• Bayes rule for mixed r.v.s: Given pΘ(θ) and fY |Θ(y|θ), then

pΘ|Y (θ | y) =fY |Θ(y|θ)

θ′ pΘ(θ′)fY |Θ(y|θ′)

pΘ(θ)

Conversely, given fY (y) and pΘ|Y (θ|y), then

fY |Θ(y|θ) =pΘ|Y (θ|y)

fY (y′)pΘ|Y (θ|y′)dy′fY (y)

EE 278B: Probability and Random Variables 1 – 39

• Example: Additive Gaussian Noise Channel

Consider the following communication channel:

Θ Y

Z ∼ N (0, N)

The signal transmitted is a binary random variable Θ:

Θ =

+1 with probability p

−1 with probability 1− p

The received signal, also called the observation, is Y = Θ+ Z , where Θ and Zare independent

Given Y = y is received (observed), find pΘ|Y (θ|y), the a posteriori pmf of Θ

EE 278B: Probability and Random Variables 1 – 40

Page 21: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

Solution: We use Bayes rule

pΘ|Y (θ|y) =fY |Θ(y|θ)

θ′ pΘ(θ′)fY |Θ(y|θ′)

pΘ(θ)

We are given pΘ(θ):

pΘ(+1) = p and pΘ(−1) = 1− p

and fY |Θ(y|θ) = fZ(y − θ):

Y | Θ = +1 ∼ N (+1, N) and Y | Θ = −1 ∼ N (−1, N)

Therefore

pΘ|Y (1|y) =p√2πN

e−(y−1)2

2N

p√2πN

e−(y−1)2

2N +(1 − p)√

2πNe−

(y+1)2

2N

=pe

yN

peyN + (1− p)e−

yN

for −∞ < y < ∞

EE 278B: Probability and Random Variables 1 – 41

Scalar Detection

• Consider the following general digital communication system

Θ ∈ θ0, θ1 Θ(Y ) ∈ θ0, θ1Ynoisychannel decoder

fY |Θ(y|θ)where the signal sent is

Θ =

θ0 with probability p

θ1 with probability 1− p

and the observation (received signal) is

Y | Θ = θ ∼ fY |Θ(y | θ) , θ ∈ θ0, θ1

• We wish to find the estimate Θ(Y ) (i.e., design the decoder) that minimizes theprobability of error :

Pe= PΘ 6= Θ = PΘ = θ0, Θ = θ1+ PΘ = θ1, Θ = θ0= PΘ = θ0PΘ = θ1 |Θ= θ0+ PΘ= θ1PΘ = θ0 |Θ= θ1

EE 278B: Probability and Random Variables 1 – 42

Page 22: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

• We define the maximum a posteriori probability (MAP) decoder as

Θ(y) =

θ0 if pΘ|Y (θ0|y) > pΘ|Y (θ1|y)θ1 otherwise

• The MAP decoding rule minimizes Pe, since

Pe = 1− PΘ(Y ) = Θ

= 1−∫ ∞

−∞fY (y)PΘ(y) = Θ |Y = y dy

and the integral is maximized when we pick the largest PΘ(y) = Θ |Y = yfor each y, which is precisely the MAP decoder

• If p = 12 , i.e., equally likely signals, using Bayes rule, the MAP decoder reduces

to the maximum likelihood (ML) decoder

Θ(y) =

θ0 if fY |Θ(y|θ0) > fY |Θ(y|θ1)θ1 otherwise

EE 278B: Probability and Random Variables 1 – 43

Additive Gaussian Noise Channel

• Consider the additive Gaussian noise channel with signal

Θ =

+√P with probability 1

2

−√P with probability 1

2

noise Z ∼ N (0, N) (Θ and Z are independent), and output Y = Θ+ Z

• The MAP decoder is

Θ(y) =

+√P if

PΘ = +√P |Y = y

PΘ = −√P | Y = y

> 1

−√P otherwise

Since the two signals are equally likely, the MAP decoding rule reduces to theML decoding rule

Θ(y) =

+√P if

fY |Θ(y | +√P )

fY |Θ(y | −√P )

> 1

−√P otherwise

EE 278B: Probability and Random Variables 1 – 44

Page 23: LectureNotes1 ProbabilityandRandomVariables...Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally,

• Using the Gaussian pdf, the ML decoder reduces to the minimum distance

decoder

Θ(y) =

+√P (y −

√P )2 < (y − (−

√P ))2

−√P otherwise

From the figure, this simplifies to

Θ(y) =

+√P y > 0

−√P y < 0

Note: The decision when y = 0 is arbitrary

y

f(y |−√P ) f(y |+

√P )

+√P−

√P

EE 278B: Probability and Random Variables 1 – 45

• Now to find the minimum probability of error, consider

Pe = PΘ(Y ) 6= Θ

= PΘ =√PPΘ(Y ) = −

√P |Θ =

√P +

PΘ = −√PPΘ(Y ) =

√P |Θ = −

√P

= 12PY ≤ 0 |Θ =

√P+ 1

2PY > 0 |Θ = −√P

= 12PZ ≤ −

√P+ 1

2PZ >√P

= Q

(

P

N

)

= Q(√

SNR)

The probability of error is a decreasing function of P/N , the signal-to-noise

ratio (SNR)

EE 278B: Probability and Random Variables 1 – 46


Recommended