85
EEL 5544 - Noise in Linear Systems (Should be called “Probability and Random Processes for Electrical Engineers”) Dr. John M. Shea Fall 2008 Pre-requisite: Very strong mathematical skills. Solid understanding of systems theory, including convolution, Fourier transforms, and impulse functions. Knowledge of basic linear algebra, including matrix properties and eigen-decomposition. Computer requirement: Some problems will require MATLAB. Students may want to purchase the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not a valid excuse for late work. Web access with the ability to run Java programs is also required. Meeting Time: 9:35–10:25 Monday/Wednesday/Friday Meeting Room: NEB 201 Final Exam: The third exam will be given during the final exam time slot determined by the University, which is December 18th at 10:00 AM – 12:00 noon. E-mail: [email protected]fl.edu Instant messaging: On AIM: eel5544. Class Web page: http://wireless.ece.ufl.edu/eel5544 Personal Web page: http://wireless.ece.ufl.edu/jshea Office: 439 New Engineering Building Phone: (352)846-3042 Office hours: 10:40–11:30 Mondays and 1:55–3:50 Wednesdays Textbook: Henry Stark and John W. Woods, Probability and Random Processes with Applications to Signal Processing, Prentice Hall, 3rd ed., 2002 (ISBN 0-13-020071-9). Suggested References: If you feel like you are having a hard time with basic probability, I suggest: D. P. Bertsekas and J. N. Tsitsiklis, Introduction to Probability, 2nd ed., 2008 (ISBN 978-1-886529-23-6). Sheldon Ross, A First Course in Probability, Prentice Hall, 6th ed., 2002 (ISBN 0-13-033851-6). For more depth on filtering of random processes: Michael B. Pursley, Random Processes in Linear Systems, Prentice Hall, 2002 (ISBN 0-13-067391-9) S-1

EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

EEL 5544 - Noise in Linear Systems(Should be called ldquoProbability and Random Processes

for Electrical Engineersrdquo)

Dr John M Shea

Fall 2008

Pre-requisite Very strong mathematical skills Solid understanding of systems theory includingconvolution Fourier transforms and impulse functions Knowledge of basic linear algebraincluding matrix properties and eigen-decomposition

Computer requirement Some problems will require MATLAB Students may want to purchasethe student version of MATLAB as departmental computer resources are limited Not beingable to get on a computer is not a valid excuse for late work Web access with the ability torun Java programs is also required

Meeting Time 935ndash1025 MondayWednesdayFridayMeeting Room NEB 201Final Exam The third exam will be given during the final exam time slot determined by the

University which is December 18th at 1000 AM ndash 1200 noonE-mail jsheaeceufleduInstant messaging On AIM eel5544

Class Web page httpwirelesseceufledueel5544Personal Web page httpwirelesseceufledujsheaOffice 439 New Engineering BuildingPhone (352)846-3042Office hours 1040ndash1130 Mondays and 155ndash350 WednesdaysTextbook Henry Stark and John W Woods Probability and Random Processes with Applications

to Signal Processing Prentice Hall 3rd ed 2002 (ISBN 0-13-020071-9)Suggested References

bull If you feel like you are having a hard time with basic probability I suggestD P Bertsekas and J N Tsitsiklis Introduction to Probability 2nd ed 2008 (ISBN978-1-886529-23-6)

Sheldon Ross A First Course in Probability Prentice Hall 6th ed 2002 (ISBN 0-13-033851-6)

bull For more depth on filtering of random processesMichael B Pursley Random Processes in Linear Systems Prentice Hall 2002 (ISBN0-13-067391-9)

S-1

S-2

Other References

bull Alberto Leon-Garcia Probability and Random Processes for Electrical EngineeringPrentice-Hall (was Addison-Wesley) 2nd ed 1994 (ISBN 0-201-50037-X)

bull Athanasios Papoulis and S Unnikrishna Pillai Probability Random Variables andStochastic Processes McGraw Hill 4th ed 2002 (ISBN 0-07-112256-7)

Grader Byonghyok Choi (aru0080ufledu)

Course Topics (as time allows)

bull Probability spaces and axioms of probability

bull Combinatorial (counting) analysis

bull Conditional probability total probability and Bayersquos rule

bull Statistical independence

bull Sequential experiments

bull Single random variables and types of random variables

bull Important random variables

bull Distribution and density functions

bull Functions of one random variable

bull Expected value of a random variable

bull Mean variance standard deviation Nth moment Nth central moment

bull Markov and Chebyshev inequalities

bull Transform methods Characteristic and generating functions Laplace transform

bull Generating random variables

bull Multiple random variables

bull Joint and marginal distribution and density functions

bull Functions of several random variables

bull Joint moments and joint characteristic functions

bull Conditional expected value

bull Laws of large numbers and the central limit theorem

bull Random processes

bull Mean autocorrelation and autocovariance functions

bull Stationarity

bull Time-invariant filtering of random processes

bull Power spectral density

bull Matched filters and Weiner optimum systems

S-3

bull Markov chains

bull Examples of practical applications of the theory

Goals and Objectives Upon completion of this course the student should be able to

bull Recite the axioms of probability use the axioms and their corrolaries to give reasonableanswers

bull Determine probabilities based on counting (lottery tickets etc)

bull Calculate probabilities of events from the density or distribution functions for randomvariables

bull Classify random variables based on their density or distribution functions

bull Know the density and distribution functions for common random variables

bull Determine random variables from definitions based on the underlying probability space

bull Determine the density and distribution functions for functions of random variablesusing several different techniques presented in class

bull Calculate expected values for random variables

bull Determine whether events random variables or random processes are statisticallyindependent

bull Use inequalities to find bounds for probabilities that might otherwise be difficult toevaluate

bull Use transform methods to simplify solving some problems that would otherwise bedifficult

bull Evaluate probabilities involving multiple random variables or functions of multiplerandom variables

bull Classify random processes based on their time support and value support

bull Simulate random variables and random processes

bull Classify random processes based on stationarity

bull Evaluate the mean autocovariance and autocorrelation functions for random processesat the output of a linear filter

bull Evaluate the power spectral density for wide-sense stationary random processes

bull Give the matched filter solution for a simple signal transmitted in additive white Gaussiannoise

bull Determine the steady state probabilities for a Markov chain

Grading Grading for on-campus students will be based on three exams (30 each) and classparticipation and quizzes (10)Grading for EDGE students will be based on three exams(25 each) homework (15) and class participation and quizzes (10) The participationscore will take into account in-class participation e-mail or instant messaging exchanges

S-4

discussions outside of class etc I will also give unannounced quizzes that will count towardthe participation grade A grade of gt 90 is guaranteed an A gt 80 is guaranteed a B etc

Because of limited grading support most homework sets will not be graded for on-campusstudents Homework sets for off-campus students will be graded on a a spot-check basis ifI give ten problems I may only ask the grader to check four or five of them Homework willbe accepted late up to two times with an automatic 25 reduction in grade

No formal project is required but as mention above students will be required to use MATLABin solving some homework problems

When students request that a submission (test or homework) be regraded I reserve the rightto regrade the entire submission rather than just a single problem

Attendance Attendance is not mandatory However students are expected to know all materialcovered in class even if it is not in the book Furthermore the instructor reserves the right togive unannounced ldquopoprdquo quizzes with no make-up option Students who miss such quizzeswill receive zeros for that grade If an exam must be missed the student must see theinstructor and make arrangements in advance unless an emergency makes this impossibleApproval for make-up exams is much more likely if the student is willing to take the examearly

Academic Honesty

All students admitted to the University of Florida have signed a statement of academic honestycommitting themselves to be honest in all academic work and understanding that failure to complywith this commitment will result in disciplinary action

This statement is a reminder to uphold your obligation as a student at the University of Floridaand to be honest in all work submitted and exams taken in this class and all others

Honor statements on tests must be signed in order to receive any credit for that test

I understand that many of you will have access to at least some of the homework solutionsTime constraints prohibit me from developing completely new sets of homework problems eachsemester Therefore I can only tell you that homework problems exist for your benefit It isdishonest to turn in work that is not your own In creating your homework solution you should notuse the homework solution that I created in a previous year or someone elsersquos homework solutionIf I suspect that too many people are turning in work that is not their own then I will completelyremove homework from the course grade

Collaboration on homework is permitted unless explicitly prohibited provided that

1 Collaboration is restricted to students currently in this course

2 Collaboration must be a shared effort

3 Each student must write up hisher homework independently

S-5

4 On problems involving MATLAB programs each student should write their own programStudents may discuss the implementations of the program but students should not work asa group in writing the programs

I have a zero-tolerance policy for cheating in this class If you talk to anyone other than meduring an exam I will give you a zero If you plagiarize (copy someone elsersquos words) or otherwisecopy someone elsersquos work I will give you a failing grade for the class Furthermore I will beforced to bring academic dishonesty charges against anyone who violates the Honor Code

L1-1

EEL 5544 Noise in Linear Systems Lecture 1

RANDOM EXPERIMENTS

Motivation

The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes

Read in the textbook from the Preface to Section 14

Try to answer the following problems

1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face

2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll

These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting

Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment

L1-2

What is a random experiment

QWhat do we mean by random

Output is unpredictable in some sense

DEFN

A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions

A random experiment is specified by

1 an experimental procedure2 a set of outcomes and measurement restrictions

DEFN

An outcome of a random experiment is a result that cannot be decomposed intoother results

DEFN

The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω

EX

Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded

bull Q What are the outcomes

Ω =

EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed

bull QWhat are the outcomes

Ω =

EX A 6-sided die is rolled and whether the top face is even or odd is observed

bull QWhat are the outcomes

Ω =

L1-3

If the outcome of an experiment is random how can we apply quantitative techniques to it

This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed

in the book in Section 12

1 Probability as Intuition

2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

3 Probability as a Measure of Frequency of Occurrence

4 Probability Based on Axiomatic Theory

Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

bull Some experiments are said to be ldquofairrdquo

ndash For a fair coin or a fair die toss the outcomes are equally likely

ndash Equally likely outcomes are common in many situations

bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis

bull Given an experiment with equally likely outcomes we can determine the probabilities easily

bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes

ndash An outcome or set of outcomes with probability 1 (one) is certain to occur

ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur

ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points

bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes

EX Tossing a fair coin

Let pH = Prob heads pT = Prob tails

Then pH + pT = 1(the probability that something occurs is 1)

Since pH = pT pH = pT = 12

EX Rolling a fair 6-sided die

L1-4

Let pi = Prob top face is iThen

6sumi=1

pi = 1

rArr 6p1 = 1

rArr p1 =1

6

So pi = 16

for i = 1 2 3 4 5 6

bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes

EX Rolling a fair 6-sided die

What is Prob 1 or 2

The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes

We can use equally-likely outcomes on repeated trials but we have to be careful

EX Rolling a fair 6-sided dice twice

Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die

L1-5

What are some problems with defining probability in this way

1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena

2 Can only deal with a finite number of outcomesndash this again limits its usefulness

Probability as a Measure of Frequency of Occurrence

bull Consider a random experiment that has K possible outcomes K lt infin

bull Let Nk(n) = the number of times the outcome is k

bull Then we can tabulate Nk(n) for various values of k and n

bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo

DEFN

The relative frequency of outcome k of a random experiment is

fk(n) =Nk(n)

n

bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue

DEFN

An experiment possesses statistical regularity if

limnrarrinfin

fk(n) = pk (a constant)forallk

DEFN

For experiments with statistical regularity as defined above pk is called theprobability of outcome k

PROPERTIES OF RELATIVE FREQUENCYbull Note that

0 le Nk(n) le n forallk

because Nk(n) is just the of times outcome k occurs in n trials

Dividing by n yields

0 le Nk(n)

n= fk(n) le 1 forallk = 1 K (1)

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 2: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

S-2

Other References

bull Alberto Leon-Garcia Probability and Random Processes for Electrical EngineeringPrentice-Hall (was Addison-Wesley) 2nd ed 1994 (ISBN 0-201-50037-X)

bull Athanasios Papoulis and S Unnikrishna Pillai Probability Random Variables andStochastic Processes McGraw Hill 4th ed 2002 (ISBN 0-07-112256-7)

Grader Byonghyok Choi (aru0080ufledu)

Course Topics (as time allows)

bull Probability spaces and axioms of probability

bull Combinatorial (counting) analysis

bull Conditional probability total probability and Bayersquos rule

bull Statistical independence

bull Sequential experiments

bull Single random variables and types of random variables

bull Important random variables

bull Distribution and density functions

bull Functions of one random variable

bull Expected value of a random variable

bull Mean variance standard deviation Nth moment Nth central moment

bull Markov and Chebyshev inequalities

bull Transform methods Characteristic and generating functions Laplace transform

bull Generating random variables

bull Multiple random variables

bull Joint and marginal distribution and density functions

bull Functions of several random variables

bull Joint moments and joint characteristic functions

bull Conditional expected value

bull Laws of large numbers and the central limit theorem

bull Random processes

bull Mean autocorrelation and autocovariance functions

bull Stationarity

bull Time-invariant filtering of random processes

bull Power spectral density

bull Matched filters and Weiner optimum systems

S-3

bull Markov chains

bull Examples of practical applications of the theory

Goals and Objectives Upon completion of this course the student should be able to

bull Recite the axioms of probability use the axioms and their corrolaries to give reasonableanswers

bull Determine probabilities based on counting (lottery tickets etc)

bull Calculate probabilities of events from the density or distribution functions for randomvariables

bull Classify random variables based on their density or distribution functions

bull Know the density and distribution functions for common random variables

bull Determine random variables from definitions based on the underlying probability space

bull Determine the density and distribution functions for functions of random variablesusing several different techniques presented in class

bull Calculate expected values for random variables

bull Determine whether events random variables or random processes are statisticallyindependent

bull Use inequalities to find bounds for probabilities that might otherwise be difficult toevaluate

bull Use transform methods to simplify solving some problems that would otherwise bedifficult

bull Evaluate probabilities involving multiple random variables or functions of multiplerandom variables

bull Classify random processes based on their time support and value support

bull Simulate random variables and random processes

bull Classify random processes based on stationarity

bull Evaluate the mean autocovariance and autocorrelation functions for random processesat the output of a linear filter

bull Evaluate the power spectral density for wide-sense stationary random processes

bull Give the matched filter solution for a simple signal transmitted in additive white Gaussiannoise

bull Determine the steady state probabilities for a Markov chain

Grading Grading for on-campus students will be based on three exams (30 each) and classparticipation and quizzes (10)Grading for EDGE students will be based on three exams(25 each) homework (15) and class participation and quizzes (10) The participationscore will take into account in-class participation e-mail or instant messaging exchanges

S-4

discussions outside of class etc I will also give unannounced quizzes that will count towardthe participation grade A grade of gt 90 is guaranteed an A gt 80 is guaranteed a B etc

Because of limited grading support most homework sets will not be graded for on-campusstudents Homework sets for off-campus students will be graded on a a spot-check basis ifI give ten problems I may only ask the grader to check four or five of them Homework willbe accepted late up to two times with an automatic 25 reduction in grade

No formal project is required but as mention above students will be required to use MATLABin solving some homework problems

When students request that a submission (test or homework) be regraded I reserve the rightto regrade the entire submission rather than just a single problem

Attendance Attendance is not mandatory However students are expected to know all materialcovered in class even if it is not in the book Furthermore the instructor reserves the right togive unannounced ldquopoprdquo quizzes with no make-up option Students who miss such quizzeswill receive zeros for that grade If an exam must be missed the student must see theinstructor and make arrangements in advance unless an emergency makes this impossibleApproval for make-up exams is much more likely if the student is willing to take the examearly

Academic Honesty

All students admitted to the University of Florida have signed a statement of academic honestycommitting themselves to be honest in all academic work and understanding that failure to complywith this commitment will result in disciplinary action

This statement is a reminder to uphold your obligation as a student at the University of Floridaand to be honest in all work submitted and exams taken in this class and all others

Honor statements on tests must be signed in order to receive any credit for that test

I understand that many of you will have access to at least some of the homework solutionsTime constraints prohibit me from developing completely new sets of homework problems eachsemester Therefore I can only tell you that homework problems exist for your benefit It isdishonest to turn in work that is not your own In creating your homework solution you should notuse the homework solution that I created in a previous year or someone elsersquos homework solutionIf I suspect that too many people are turning in work that is not their own then I will completelyremove homework from the course grade

Collaboration on homework is permitted unless explicitly prohibited provided that

1 Collaboration is restricted to students currently in this course

2 Collaboration must be a shared effort

3 Each student must write up hisher homework independently

S-5

4 On problems involving MATLAB programs each student should write their own programStudents may discuss the implementations of the program but students should not work asa group in writing the programs

I have a zero-tolerance policy for cheating in this class If you talk to anyone other than meduring an exam I will give you a zero If you plagiarize (copy someone elsersquos words) or otherwisecopy someone elsersquos work I will give you a failing grade for the class Furthermore I will beforced to bring academic dishonesty charges against anyone who violates the Honor Code

L1-1

EEL 5544 Noise in Linear Systems Lecture 1

RANDOM EXPERIMENTS

Motivation

The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes

Read in the textbook from the Preface to Section 14

Try to answer the following problems

1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face

2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll

These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting

Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment

L1-2

What is a random experiment

QWhat do we mean by random

Output is unpredictable in some sense

DEFN

A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions

A random experiment is specified by

1 an experimental procedure2 a set of outcomes and measurement restrictions

DEFN

An outcome of a random experiment is a result that cannot be decomposed intoother results

DEFN

The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω

EX

Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded

bull Q What are the outcomes

Ω =

EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed

bull QWhat are the outcomes

Ω =

EX A 6-sided die is rolled and whether the top face is even or odd is observed

bull QWhat are the outcomes

Ω =

L1-3

If the outcome of an experiment is random how can we apply quantitative techniques to it

This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed

in the book in Section 12

1 Probability as Intuition

2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

3 Probability as a Measure of Frequency of Occurrence

4 Probability Based on Axiomatic Theory

Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

bull Some experiments are said to be ldquofairrdquo

ndash For a fair coin or a fair die toss the outcomes are equally likely

ndash Equally likely outcomes are common in many situations

bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis

bull Given an experiment with equally likely outcomes we can determine the probabilities easily

bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes

ndash An outcome or set of outcomes with probability 1 (one) is certain to occur

ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur

ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points

bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes

EX Tossing a fair coin

Let pH = Prob heads pT = Prob tails

Then pH + pT = 1(the probability that something occurs is 1)

Since pH = pT pH = pT = 12

EX Rolling a fair 6-sided die

L1-4

Let pi = Prob top face is iThen

6sumi=1

pi = 1

rArr 6p1 = 1

rArr p1 =1

6

So pi = 16

for i = 1 2 3 4 5 6

bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes

EX Rolling a fair 6-sided die

What is Prob 1 or 2

The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes

We can use equally-likely outcomes on repeated trials but we have to be careful

EX Rolling a fair 6-sided dice twice

Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die

L1-5

What are some problems with defining probability in this way

1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena

2 Can only deal with a finite number of outcomesndash this again limits its usefulness

Probability as a Measure of Frequency of Occurrence

bull Consider a random experiment that has K possible outcomes K lt infin

bull Let Nk(n) = the number of times the outcome is k

bull Then we can tabulate Nk(n) for various values of k and n

bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo

DEFN

The relative frequency of outcome k of a random experiment is

fk(n) =Nk(n)

n

bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue

DEFN

An experiment possesses statistical regularity if

limnrarrinfin

fk(n) = pk (a constant)forallk

DEFN

For experiments with statistical regularity as defined above pk is called theprobability of outcome k

PROPERTIES OF RELATIVE FREQUENCYbull Note that

0 le Nk(n) le n forallk

because Nk(n) is just the of times outcome k occurs in n trials

Dividing by n yields

0 le Nk(n)

n= fk(n) le 1 forallk = 1 K (1)

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 3: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

S-3

bull Markov chains

bull Examples of practical applications of the theory

Goals and Objectives Upon completion of this course the student should be able to

bull Recite the axioms of probability use the axioms and their corrolaries to give reasonableanswers

bull Determine probabilities based on counting (lottery tickets etc)

bull Calculate probabilities of events from the density or distribution functions for randomvariables

bull Classify random variables based on their density or distribution functions

bull Know the density and distribution functions for common random variables

bull Determine random variables from definitions based on the underlying probability space

bull Determine the density and distribution functions for functions of random variablesusing several different techniques presented in class

bull Calculate expected values for random variables

bull Determine whether events random variables or random processes are statisticallyindependent

bull Use inequalities to find bounds for probabilities that might otherwise be difficult toevaluate

bull Use transform methods to simplify solving some problems that would otherwise bedifficult

bull Evaluate probabilities involving multiple random variables or functions of multiplerandom variables

bull Classify random processes based on their time support and value support

bull Simulate random variables and random processes

bull Classify random processes based on stationarity

bull Evaluate the mean autocovariance and autocorrelation functions for random processesat the output of a linear filter

bull Evaluate the power spectral density for wide-sense stationary random processes

bull Give the matched filter solution for a simple signal transmitted in additive white Gaussiannoise

bull Determine the steady state probabilities for a Markov chain

Grading Grading for on-campus students will be based on three exams (30 each) and classparticipation and quizzes (10)Grading for EDGE students will be based on three exams(25 each) homework (15) and class participation and quizzes (10) The participationscore will take into account in-class participation e-mail or instant messaging exchanges

S-4

discussions outside of class etc I will also give unannounced quizzes that will count towardthe participation grade A grade of gt 90 is guaranteed an A gt 80 is guaranteed a B etc

Because of limited grading support most homework sets will not be graded for on-campusstudents Homework sets for off-campus students will be graded on a a spot-check basis ifI give ten problems I may only ask the grader to check four or five of them Homework willbe accepted late up to two times with an automatic 25 reduction in grade

No formal project is required but as mention above students will be required to use MATLABin solving some homework problems

When students request that a submission (test or homework) be regraded I reserve the rightto regrade the entire submission rather than just a single problem

Attendance Attendance is not mandatory However students are expected to know all materialcovered in class even if it is not in the book Furthermore the instructor reserves the right togive unannounced ldquopoprdquo quizzes with no make-up option Students who miss such quizzeswill receive zeros for that grade If an exam must be missed the student must see theinstructor and make arrangements in advance unless an emergency makes this impossibleApproval for make-up exams is much more likely if the student is willing to take the examearly

Academic Honesty

All students admitted to the University of Florida have signed a statement of academic honestycommitting themselves to be honest in all academic work and understanding that failure to complywith this commitment will result in disciplinary action

This statement is a reminder to uphold your obligation as a student at the University of Floridaand to be honest in all work submitted and exams taken in this class and all others

Honor statements on tests must be signed in order to receive any credit for that test

I understand that many of you will have access to at least some of the homework solutionsTime constraints prohibit me from developing completely new sets of homework problems eachsemester Therefore I can only tell you that homework problems exist for your benefit It isdishonest to turn in work that is not your own In creating your homework solution you should notuse the homework solution that I created in a previous year or someone elsersquos homework solutionIf I suspect that too many people are turning in work that is not their own then I will completelyremove homework from the course grade

Collaboration on homework is permitted unless explicitly prohibited provided that

1 Collaboration is restricted to students currently in this course

2 Collaboration must be a shared effort

3 Each student must write up hisher homework independently

S-5

4 On problems involving MATLAB programs each student should write their own programStudents may discuss the implementations of the program but students should not work asa group in writing the programs

I have a zero-tolerance policy for cheating in this class If you talk to anyone other than meduring an exam I will give you a zero If you plagiarize (copy someone elsersquos words) or otherwisecopy someone elsersquos work I will give you a failing grade for the class Furthermore I will beforced to bring academic dishonesty charges against anyone who violates the Honor Code

L1-1

EEL 5544 Noise in Linear Systems Lecture 1

RANDOM EXPERIMENTS

Motivation

The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes

Read in the textbook from the Preface to Section 14

Try to answer the following problems

1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face

2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll

These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting

Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment

L1-2

What is a random experiment

QWhat do we mean by random

Output is unpredictable in some sense

DEFN

A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions

A random experiment is specified by

1 an experimental procedure2 a set of outcomes and measurement restrictions

DEFN

An outcome of a random experiment is a result that cannot be decomposed intoother results

DEFN

The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω

EX

Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded

bull Q What are the outcomes

Ω =

EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed

bull QWhat are the outcomes

Ω =

EX A 6-sided die is rolled and whether the top face is even or odd is observed

bull QWhat are the outcomes

Ω =

L1-3

If the outcome of an experiment is random how can we apply quantitative techniques to it

This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed

in the book in Section 12

1 Probability as Intuition

2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

3 Probability as a Measure of Frequency of Occurrence

4 Probability Based on Axiomatic Theory

Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

bull Some experiments are said to be ldquofairrdquo

ndash For a fair coin or a fair die toss the outcomes are equally likely

ndash Equally likely outcomes are common in many situations

bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis

bull Given an experiment with equally likely outcomes we can determine the probabilities easily

bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes

ndash An outcome or set of outcomes with probability 1 (one) is certain to occur

ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur

ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points

bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes

EX Tossing a fair coin

Let pH = Prob heads pT = Prob tails

Then pH + pT = 1(the probability that something occurs is 1)

Since pH = pT pH = pT = 12

EX Rolling a fair 6-sided die

L1-4

Let pi = Prob top face is iThen

6sumi=1

pi = 1

rArr 6p1 = 1

rArr p1 =1

6

So pi = 16

for i = 1 2 3 4 5 6

bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes

EX Rolling a fair 6-sided die

What is Prob 1 or 2

The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes

We can use equally-likely outcomes on repeated trials but we have to be careful

EX Rolling a fair 6-sided dice twice

Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die

L1-5

What are some problems with defining probability in this way

1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena

2 Can only deal with a finite number of outcomesndash this again limits its usefulness

Probability as a Measure of Frequency of Occurrence

bull Consider a random experiment that has K possible outcomes K lt infin

bull Let Nk(n) = the number of times the outcome is k

bull Then we can tabulate Nk(n) for various values of k and n

bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo

DEFN

The relative frequency of outcome k of a random experiment is

fk(n) =Nk(n)

n

bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue

DEFN

An experiment possesses statistical regularity if

limnrarrinfin

fk(n) = pk (a constant)forallk

DEFN

For experiments with statistical regularity as defined above pk is called theprobability of outcome k

PROPERTIES OF RELATIVE FREQUENCYbull Note that

0 le Nk(n) le n forallk

because Nk(n) is just the of times outcome k occurs in n trials

Dividing by n yields

0 le Nk(n)

n= fk(n) le 1 forallk = 1 K (1)

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 4: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

S-4

discussions outside of class etc I will also give unannounced quizzes that will count towardthe participation grade A grade of gt 90 is guaranteed an A gt 80 is guaranteed a B etc

Because of limited grading support most homework sets will not be graded for on-campusstudents Homework sets for off-campus students will be graded on a a spot-check basis ifI give ten problems I may only ask the grader to check four or five of them Homework willbe accepted late up to two times with an automatic 25 reduction in grade

No formal project is required but as mention above students will be required to use MATLABin solving some homework problems

When students request that a submission (test or homework) be regraded I reserve the rightto regrade the entire submission rather than just a single problem

Attendance Attendance is not mandatory However students are expected to know all materialcovered in class even if it is not in the book Furthermore the instructor reserves the right togive unannounced ldquopoprdquo quizzes with no make-up option Students who miss such quizzeswill receive zeros for that grade If an exam must be missed the student must see theinstructor and make arrangements in advance unless an emergency makes this impossibleApproval for make-up exams is much more likely if the student is willing to take the examearly

Academic Honesty

All students admitted to the University of Florida have signed a statement of academic honestycommitting themselves to be honest in all academic work and understanding that failure to complywith this commitment will result in disciplinary action

This statement is a reminder to uphold your obligation as a student at the University of Floridaand to be honest in all work submitted and exams taken in this class and all others

Honor statements on tests must be signed in order to receive any credit for that test

I understand that many of you will have access to at least some of the homework solutionsTime constraints prohibit me from developing completely new sets of homework problems eachsemester Therefore I can only tell you that homework problems exist for your benefit It isdishonest to turn in work that is not your own In creating your homework solution you should notuse the homework solution that I created in a previous year or someone elsersquos homework solutionIf I suspect that too many people are turning in work that is not their own then I will completelyremove homework from the course grade

Collaboration on homework is permitted unless explicitly prohibited provided that

1 Collaboration is restricted to students currently in this course

2 Collaboration must be a shared effort

3 Each student must write up hisher homework independently

S-5

4 On problems involving MATLAB programs each student should write their own programStudents may discuss the implementations of the program but students should not work asa group in writing the programs

I have a zero-tolerance policy for cheating in this class If you talk to anyone other than meduring an exam I will give you a zero If you plagiarize (copy someone elsersquos words) or otherwisecopy someone elsersquos work I will give you a failing grade for the class Furthermore I will beforced to bring academic dishonesty charges against anyone who violates the Honor Code

L1-1

EEL 5544 Noise in Linear Systems Lecture 1

RANDOM EXPERIMENTS

Motivation

The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes

Read in the textbook from the Preface to Section 14

Try to answer the following problems

1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face

2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll

These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting

Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment

L1-2

What is a random experiment

QWhat do we mean by random

Output is unpredictable in some sense

DEFN

A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions

A random experiment is specified by

1 an experimental procedure2 a set of outcomes and measurement restrictions

DEFN

An outcome of a random experiment is a result that cannot be decomposed intoother results

DEFN

The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω

EX

Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded

bull Q What are the outcomes

Ω =

EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed

bull QWhat are the outcomes

Ω =

EX A 6-sided die is rolled and whether the top face is even or odd is observed

bull QWhat are the outcomes

Ω =

L1-3

If the outcome of an experiment is random how can we apply quantitative techniques to it

This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed

in the book in Section 12

1 Probability as Intuition

2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

3 Probability as a Measure of Frequency of Occurrence

4 Probability Based on Axiomatic Theory

Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

bull Some experiments are said to be ldquofairrdquo

ndash For a fair coin or a fair die toss the outcomes are equally likely

ndash Equally likely outcomes are common in many situations

bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis

bull Given an experiment with equally likely outcomes we can determine the probabilities easily

bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes

ndash An outcome or set of outcomes with probability 1 (one) is certain to occur

ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur

ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points

bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes

EX Tossing a fair coin

Let pH = Prob heads pT = Prob tails

Then pH + pT = 1(the probability that something occurs is 1)

Since pH = pT pH = pT = 12

EX Rolling a fair 6-sided die

L1-4

Let pi = Prob top face is iThen

6sumi=1

pi = 1

rArr 6p1 = 1

rArr p1 =1

6

So pi = 16

for i = 1 2 3 4 5 6

bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes

EX Rolling a fair 6-sided die

What is Prob 1 or 2

The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes

We can use equally-likely outcomes on repeated trials but we have to be careful

EX Rolling a fair 6-sided dice twice

Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die

L1-5

What are some problems with defining probability in this way

1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena

2 Can only deal with a finite number of outcomesndash this again limits its usefulness

Probability as a Measure of Frequency of Occurrence

bull Consider a random experiment that has K possible outcomes K lt infin

bull Let Nk(n) = the number of times the outcome is k

bull Then we can tabulate Nk(n) for various values of k and n

bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo

DEFN

The relative frequency of outcome k of a random experiment is

fk(n) =Nk(n)

n

bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue

DEFN

An experiment possesses statistical regularity if

limnrarrinfin

fk(n) = pk (a constant)forallk

DEFN

For experiments with statistical regularity as defined above pk is called theprobability of outcome k

PROPERTIES OF RELATIVE FREQUENCYbull Note that

0 le Nk(n) le n forallk

because Nk(n) is just the of times outcome k occurs in n trials

Dividing by n yields

0 le Nk(n)

n= fk(n) le 1 forallk = 1 K (1)

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 5: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

S-5

4 On problems involving MATLAB programs each student should write their own programStudents may discuss the implementations of the program but students should not work asa group in writing the programs

I have a zero-tolerance policy for cheating in this class If you talk to anyone other than meduring an exam I will give you a zero If you plagiarize (copy someone elsersquos words) or otherwisecopy someone elsersquos work I will give you a failing grade for the class Furthermore I will beforced to bring academic dishonesty charges against anyone who violates the Honor Code

L1-1

EEL 5544 Noise in Linear Systems Lecture 1

RANDOM EXPERIMENTS

Motivation

The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes

Read in the textbook from the Preface to Section 14

Try to answer the following problems

1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face

2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll

These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting

Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment

L1-2

What is a random experiment

QWhat do we mean by random

Output is unpredictable in some sense

DEFN

A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions

A random experiment is specified by

1 an experimental procedure2 a set of outcomes and measurement restrictions

DEFN

An outcome of a random experiment is a result that cannot be decomposed intoother results

DEFN

The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω

EX

Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded

bull Q What are the outcomes

Ω =

EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed

bull QWhat are the outcomes

Ω =

EX A 6-sided die is rolled and whether the top face is even or odd is observed

bull QWhat are the outcomes

Ω =

L1-3

If the outcome of an experiment is random how can we apply quantitative techniques to it

This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed

in the book in Section 12

1 Probability as Intuition

2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

3 Probability as a Measure of Frequency of Occurrence

4 Probability Based on Axiomatic Theory

Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

bull Some experiments are said to be ldquofairrdquo

ndash For a fair coin or a fair die toss the outcomes are equally likely

ndash Equally likely outcomes are common in many situations

bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis

bull Given an experiment with equally likely outcomes we can determine the probabilities easily

bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes

ndash An outcome or set of outcomes with probability 1 (one) is certain to occur

ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur

ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points

bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes

EX Tossing a fair coin

Let pH = Prob heads pT = Prob tails

Then pH + pT = 1(the probability that something occurs is 1)

Since pH = pT pH = pT = 12

EX Rolling a fair 6-sided die

L1-4

Let pi = Prob top face is iThen

6sumi=1

pi = 1

rArr 6p1 = 1

rArr p1 =1

6

So pi = 16

for i = 1 2 3 4 5 6

bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes

EX Rolling a fair 6-sided die

What is Prob 1 or 2

The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes

We can use equally-likely outcomes on repeated trials but we have to be careful

EX Rolling a fair 6-sided dice twice

Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die

L1-5

What are some problems with defining probability in this way

1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena

2 Can only deal with a finite number of outcomesndash this again limits its usefulness

Probability as a Measure of Frequency of Occurrence

bull Consider a random experiment that has K possible outcomes K lt infin

bull Let Nk(n) = the number of times the outcome is k

bull Then we can tabulate Nk(n) for various values of k and n

bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo

DEFN

The relative frequency of outcome k of a random experiment is

fk(n) =Nk(n)

n

bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue

DEFN

An experiment possesses statistical regularity if

limnrarrinfin

fk(n) = pk (a constant)forallk

DEFN

For experiments with statistical regularity as defined above pk is called theprobability of outcome k

PROPERTIES OF RELATIVE FREQUENCYbull Note that

0 le Nk(n) le n forallk

because Nk(n) is just the of times outcome k occurs in n trials

Dividing by n yields

0 le Nk(n)

n= fk(n) le 1 forallk = 1 K (1)

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 6: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L1-1

EEL 5544 Noise in Linear Systems Lecture 1

RANDOM EXPERIMENTS

Motivation

The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes

Read in the textbook from the Preface to Section 14

Try to answer the following problems

1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face

2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll

These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting

Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment

L1-2

What is a random experiment

QWhat do we mean by random

Output is unpredictable in some sense

DEFN

A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions

A random experiment is specified by

1 an experimental procedure2 a set of outcomes and measurement restrictions

DEFN

An outcome of a random experiment is a result that cannot be decomposed intoother results

DEFN

The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω

EX

Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded

bull Q What are the outcomes

Ω =

EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed

bull QWhat are the outcomes

Ω =

EX A 6-sided die is rolled and whether the top face is even or odd is observed

bull QWhat are the outcomes

Ω =

L1-3

If the outcome of an experiment is random how can we apply quantitative techniques to it

This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed

in the book in Section 12

1 Probability as Intuition

2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

3 Probability as a Measure of Frequency of Occurrence

4 Probability Based on Axiomatic Theory

Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

bull Some experiments are said to be ldquofairrdquo

ndash For a fair coin or a fair die toss the outcomes are equally likely

ndash Equally likely outcomes are common in many situations

bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis

bull Given an experiment with equally likely outcomes we can determine the probabilities easily

bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes

ndash An outcome or set of outcomes with probability 1 (one) is certain to occur

ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur

ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points

bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes

EX Tossing a fair coin

Let pH = Prob heads pT = Prob tails

Then pH + pT = 1(the probability that something occurs is 1)

Since pH = pT pH = pT = 12

EX Rolling a fair 6-sided die

L1-4

Let pi = Prob top face is iThen

6sumi=1

pi = 1

rArr 6p1 = 1

rArr p1 =1

6

So pi = 16

for i = 1 2 3 4 5 6

bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes

EX Rolling a fair 6-sided die

What is Prob 1 or 2

The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes

We can use equally-likely outcomes on repeated trials but we have to be careful

EX Rolling a fair 6-sided dice twice

Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die

L1-5

What are some problems with defining probability in this way

1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena

2 Can only deal with a finite number of outcomesndash this again limits its usefulness

Probability as a Measure of Frequency of Occurrence

bull Consider a random experiment that has K possible outcomes K lt infin

bull Let Nk(n) = the number of times the outcome is k

bull Then we can tabulate Nk(n) for various values of k and n

bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo

DEFN

The relative frequency of outcome k of a random experiment is

fk(n) =Nk(n)

n

bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue

DEFN

An experiment possesses statistical regularity if

limnrarrinfin

fk(n) = pk (a constant)forallk

DEFN

For experiments with statistical regularity as defined above pk is called theprobability of outcome k

PROPERTIES OF RELATIVE FREQUENCYbull Note that

0 le Nk(n) le n forallk

because Nk(n) is just the of times outcome k occurs in n trials

Dividing by n yields

0 le Nk(n)

n= fk(n) le 1 forallk = 1 K (1)

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 7: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L1-2

What is a random experiment

QWhat do we mean by random

Output is unpredictable in some sense

DEFN

A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions

A random experiment is specified by

1 an experimental procedure2 a set of outcomes and measurement restrictions

DEFN

An outcome of a random experiment is a result that cannot be decomposed intoother results

DEFN

The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω

EX

Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded

bull Q What are the outcomes

Ω =

EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed

bull QWhat are the outcomes

Ω =

EX A 6-sided die is rolled and whether the top face is even or odd is observed

bull QWhat are the outcomes

Ω =

L1-3

If the outcome of an experiment is random how can we apply quantitative techniques to it

This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed

in the book in Section 12

1 Probability as Intuition

2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

3 Probability as a Measure of Frequency of Occurrence

4 Probability Based on Axiomatic Theory

Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

bull Some experiments are said to be ldquofairrdquo

ndash For a fair coin or a fair die toss the outcomes are equally likely

ndash Equally likely outcomes are common in many situations

bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis

bull Given an experiment with equally likely outcomes we can determine the probabilities easily

bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes

ndash An outcome or set of outcomes with probability 1 (one) is certain to occur

ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur

ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points

bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes

EX Tossing a fair coin

Let pH = Prob heads pT = Prob tails

Then pH + pT = 1(the probability that something occurs is 1)

Since pH = pT pH = pT = 12

EX Rolling a fair 6-sided die

L1-4

Let pi = Prob top face is iThen

6sumi=1

pi = 1

rArr 6p1 = 1

rArr p1 =1

6

So pi = 16

for i = 1 2 3 4 5 6

bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes

EX Rolling a fair 6-sided die

What is Prob 1 or 2

The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes

We can use equally-likely outcomes on repeated trials but we have to be careful

EX Rolling a fair 6-sided dice twice

Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die

L1-5

What are some problems with defining probability in this way

1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena

2 Can only deal with a finite number of outcomesndash this again limits its usefulness

Probability as a Measure of Frequency of Occurrence

bull Consider a random experiment that has K possible outcomes K lt infin

bull Let Nk(n) = the number of times the outcome is k

bull Then we can tabulate Nk(n) for various values of k and n

bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo

DEFN

The relative frequency of outcome k of a random experiment is

fk(n) =Nk(n)

n

bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue

DEFN

An experiment possesses statistical regularity if

limnrarrinfin

fk(n) = pk (a constant)forallk

DEFN

For experiments with statistical regularity as defined above pk is called theprobability of outcome k

PROPERTIES OF RELATIVE FREQUENCYbull Note that

0 le Nk(n) le n forallk

because Nk(n) is just the of times outcome k occurs in n trials

Dividing by n yields

0 le Nk(n)

n= fk(n) le 1 forallk = 1 K (1)

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 8: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L1-3

If the outcome of an experiment is random how can we apply quantitative techniques to it

This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed

in the book in Section 12

1 Probability as Intuition

2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

3 Probability as a Measure of Frequency of Occurrence

4 Probability Based on Axiomatic Theory

Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)

bull Some experiments are said to be ldquofairrdquo

ndash For a fair coin or a fair die toss the outcomes are equally likely

ndash Equally likely outcomes are common in many situations

bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis

bull Given an experiment with equally likely outcomes we can determine the probabilities easily

bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes

ndash An outcome or set of outcomes with probability 1 (one) is certain to occur

ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur

ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points

bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes

EX Tossing a fair coin

Let pH = Prob heads pT = Prob tails

Then pH + pT = 1(the probability that something occurs is 1)

Since pH = pT pH = pT = 12

EX Rolling a fair 6-sided die

L1-4

Let pi = Prob top face is iThen

6sumi=1

pi = 1

rArr 6p1 = 1

rArr p1 =1

6

So pi = 16

for i = 1 2 3 4 5 6

bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes

EX Rolling a fair 6-sided die

What is Prob 1 or 2

The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes

We can use equally-likely outcomes on repeated trials but we have to be careful

EX Rolling a fair 6-sided dice twice

Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die

L1-5

What are some problems with defining probability in this way

1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena

2 Can only deal with a finite number of outcomesndash this again limits its usefulness

Probability as a Measure of Frequency of Occurrence

bull Consider a random experiment that has K possible outcomes K lt infin

bull Let Nk(n) = the number of times the outcome is k

bull Then we can tabulate Nk(n) for various values of k and n

bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo

DEFN

The relative frequency of outcome k of a random experiment is

fk(n) =Nk(n)

n

bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue

DEFN

An experiment possesses statistical regularity if

limnrarrinfin

fk(n) = pk (a constant)forallk

DEFN

For experiments with statistical regularity as defined above pk is called theprobability of outcome k

PROPERTIES OF RELATIVE FREQUENCYbull Note that

0 le Nk(n) le n forallk

because Nk(n) is just the of times outcome k occurs in n trials

Dividing by n yields

0 le Nk(n)

n= fk(n) le 1 forallk = 1 K (1)

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 9: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L1-4

Let pi = Prob top face is iThen

6sumi=1

pi = 1

rArr 6p1 = 1

rArr p1 =1

6

So pi = 16

for i = 1 2 3 4 5 6

bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes

EX Rolling a fair 6-sided die

What is Prob 1 or 2

The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes

We can use equally-likely outcomes on repeated trials but we have to be careful

EX Rolling a fair 6-sided dice twice

Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die

L1-5

What are some problems with defining probability in this way

1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena

2 Can only deal with a finite number of outcomesndash this again limits its usefulness

Probability as a Measure of Frequency of Occurrence

bull Consider a random experiment that has K possible outcomes K lt infin

bull Let Nk(n) = the number of times the outcome is k

bull Then we can tabulate Nk(n) for various values of k and n

bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo

DEFN

The relative frequency of outcome k of a random experiment is

fk(n) =Nk(n)

n

bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue

DEFN

An experiment possesses statistical regularity if

limnrarrinfin

fk(n) = pk (a constant)forallk

DEFN

For experiments with statistical regularity as defined above pk is called theprobability of outcome k

PROPERTIES OF RELATIVE FREQUENCYbull Note that

0 le Nk(n) le n forallk

because Nk(n) is just the of times outcome k occurs in n trials

Dividing by n yields

0 le Nk(n)

n= fk(n) le 1 forallk = 1 K (1)

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 10: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L1-5

What are some problems with defining probability in this way

1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena

2 Can only deal with a finite number of outcomesndash this again limits its usefulness

Probability as a Measure of Frequency of Occurrence

bull Consider a random experiment that has K possible outcomes K lt infin

bull Let Nk(n) = the number of times the outcome is k

bull Then we can tabulate Nk(n) for various values of k and n

bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo

DEFN

The relative frequency of outcome k of a random experiment is

fk(n) =Nk(n)

n

bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue

DEFN

An experiment possesses statistical regularity if

limnrarrinfin

fk(n) = pk (a constant)forallk

DEFN

For experiments with statistical regularity as defined above pk is called theprobability of outcome k

PROPERTIES OF RELATIVE FREQUENCYbull Note that

0 le Nk(n) le n forallk

because Nk(n) is just the of times outcome k occurs in n trials

Dividing by n yields

0 le Nk(n)

n= fk(n) le 1 forallk = 1 K (1)

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 11: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L1-6

bull If 1 2 K are all of the possible outcomes thenKsum

k=1

Nk(n) = n

Again dividing by n yieldsKsum

k=1

fk(n) = 1 (2)

Consider the event E that an even number occurs

What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)

What have we assumed in developing this equation

Then dividing by n

fE(n) =NE(n)

n=

N2(n) + N4(n) + N6(n)

n= f2(n) + f4(n) + f6(n)

bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then

fC(n) = fA(n) + fB(n) (3)

bull In some sense we can define the probability of an event as its long-term relative frequency

What are some problems with this definition

1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities

can never be known exactly3 We cannot use this definition if the experiment cannot be repeated

We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should

1 be useful for solving real problems

2 agree with our interpretation of probability as relative frequency

3 agree with our intuition (where appropriate)

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 12: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2a-1

EEL 5544 Noise in Linear Systems Lecture 2a

SET OPERATIONS

Motivation

We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets

Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)

EX

An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)

Hint It may help to draw a Venn diagram

bull Want to determine probabilities of events (combinations of outcomes)

bull Each event is a set rArr need operations on sets

1 A sub B (read ldquoA is a subset of Brdquo)

DEFN

The subset operator sub is defined for two sets A and B by

A sub B if x isin A rArr x isin B

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 13: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2a-2

Note that A = B is included in A sub B

A = B if and only if A sub B and B sub A(useful for proofs)

2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)

DEFN

The union of A and B is a set defined by

A cupB = x |x isin A or x isin B

3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)

DEFN

The intersection of A and B is defined by

A capB = AB = x |x isin A and x isin B

4 Ac or A (read ldquoA complementrdquo)

DEFN

The complement of a set A in a sample space Ω is defined by

Ac = x |x isin Ω and x isin A

bull Use Venn diagrams to convince yourself of

1 (A capB) sub A and (A capB) sub B

2(A)

= A

3 A sub B rArr B sub A

4 A cup A = Ω

5 A capB = A cupB (DeMorganrsquos Law 1)

6 A cupB = A capB (DeMorganrsquos Law 2)

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 14: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2a-3

bull One of the most important relations for sets is when sets are mutually exclusive

DEFN

Two sets A and B are said to be mutually exclusive or disjoint if

This is very important so I will emphasize it again mutually exclusive isa

SAMPLE SPACES

bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces

1 EX Roll a fair 6-sided die and note the number on the top face

2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd

3 EX Toss a coin 3 times and note the sequence of outcomes

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 15: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2a-4

4 EX Toss a coin 3 times and note the number of heads

5 EX Roll a fair die 3 times and note the number of even results

6 EX Roll a fair 6-sided die 3 times and note the number of sixes

7 EX Simultaneously toss a quarter and a dime and note the outcomes

8 EX Simultaneously toss two identical quarters and note the outcomes

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 16: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2a-5

9 EX Pick a number at random between zero and one inclusive

10 EX Measure the lifetime of a computer chip

MORE ON EVENTS

bull Recall that an event A is a subset of the sample space Ω

bull The set of all events is the denoted F or A

EX Flipping a coin

The possible events are

ndash Heads occurs

ndash Tails occurs

ndash Heads or Tails occurs

ndash Neither Heads nor Tails occurs

bull Thus the event class can be written as

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 17: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2a-6

EX Rolling a 6-sided die

There are many possible events such as

1 The number 3 occurs

2 An even number occurs

3 No number occurs

4 Any number occurs

bull EX Express the events listed in the example above in set notation

1

2

3

4

EX The 3 language problem

1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 18: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2a-7

2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class

MORE ON EQUALLY-LIKELY OUTCOMES

IN DISCRETE SAMPLE SPACES

EX Consider the following examples

A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes

P (ai) =1

|Ω3|=

1

8for any outcome ai

Let A2 = exactly 2 heads occurs in 3 tossesThen

Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 19: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2a-8

B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes

P (ai) =1

|Ω|=

1

4for any outcome ai

Then

EXTENSION OF EQUALLY LIKELY OUTCOMES

TO CONTINUOUS SAMPLE SPACES

bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes

bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin

bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0

bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0

bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals

DEFN

For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B

P ([a b]) =

bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 20: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2a-9

bull For a typical continuous sample space the prob of any particular outcome is zero

Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once

How many trials will it take infin

bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0

(Does not mean that outcome does not ever occur only that it occurs very infrequently)

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 21: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2-1

EEL 5544 Lecture 2Motivation

Read Sections 15 in the textbook

The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them

Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate

EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes

1 What percentage of males smoke

2 What percentage of males smoke neither cigars nor cigarettes

3 What percentage smoke cigars but not cigarettes

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 22: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2-2

PROBABILITY SPACES

We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )

bull The elements of a probability space for a random experiment are

1

DEFN

The sample space denoted by Ω is the of

The sample space is also known as the universal set or reference set

Sample spaces come in two basic varieties discrete or continuous

DEFN

A discrete set is either or

DEFN

A set is countably infinite if it can be put into one-to-one correspondence withthe integers

EX

Examples

DEFN

A continuous set is not countable

DEFN

If a and b gt a are in an interval I then if a le x le b x isin I

bull Intervals can be either open closed or half-open

ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)

which does not contain b

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 23: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2-3

bull Intervals can also be either finite infinite or partially infinite

EX

Example of continuous sets

bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like

ndash What is the probability that the result is evenndash What is the probability that the result is le 2

DEFNEvents are to which we assign

Examples based on previous examples of sample spaces

(a) EX

Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes

(b) EX

Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 24: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2-4

(c) EX

Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes

(d) EX

Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes

2

DEFN

F is the event class which is ato which we assign probability

bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)

bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2

bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events

DEFN

A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 25: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2-5

DEFN

A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1

Ei isinM (5)

bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1

Ei isinM (6)

bull We require that the event class F be a σ-algebra

bull For discrete sets the set of all subset of Ω is a σ-algebra

bull For continuous sets there may be more than one possible σ-algebra possible

bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R

3

DEFN

The probability measure denoted by P is a that maps allmembers of onto

Axioms of Probability

bull We specify a minimal set of rules that P must obey

I

II

III

Corollaries

bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F

(a) P (Ac) = 1minus P (A)

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 26: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L2-6

(b) P (A) le 1

(c) P (empty) = 0

(d) If A1 A2 An are pairwise mutually exclusive then

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Ak)

Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)

Proof and example on board(f)

P

(n⋃

k=1

Ak

)=

nsumk=1

P (Aj)minussumjltk

P (Aj cap Ak) + middot middot middot

+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)

Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction

(g) If A sub B then P (A) le P (B)Proof on board

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 27: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L3-1

EEL 5544 Noise in Linear Systems Lecture 3

Motivation

An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo

In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa

However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work

EX

1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents

(a) All children are of the same sex

(b) The 3 eldest are boys and the others are girls

(c) The 2 oldest are girls

(d) There is at least one girl

If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity

We define the conditional probability of an event A given that event B occurred as

P (A|B) =P (A capB)

P (B) for P (B) gt 0

What if A and B are independent

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 28: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L3-2

Motivation-continued

Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page

EX

2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes

1

STATISTICAL INDEPENDENCE

DEFN

Two events A isin A and B isin A are abbreviated if and only if

bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs

bull Events that arise from completely separate random phenomena are statistically independent

It is important to note that statistical independence is a

What type of relation is mutually exclusive

Claim If A and B are si events then the following pairs of events are also si

ndash A and B

ndash A and B

ndash A and B

11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 29: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L3-3

Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly

Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus

P (A) = P [(A capB) cup (A capB)]

= P (A capB) + P (A capB)

= P (A)P (B) + P (A capB)

Thus

P (A capB) = P (A)minus P (A)P (B)

= P (A)[1minus P (B)]

= P (A)P (B)

bull For N events to be si require

P (Ai cap Ak) = P (Ai)P (Ak)

P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)

P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)

(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)

Weaker Condition Pairwise StatisticalIndependence

DEFN

N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 30: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L3-4

Examples of Applying Statistical Independence

EXFor a couple having 5 children compute the probabilities of the followingevents

1 All children are of the same sex

2 The 3 eldest are boys and the others are girls

3 The 2 oldest are girls

4 There is at least one girl

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 31: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L3-5

RELATION BETWEEN STATISTICALLY INDEPENDENT AND

MUTUALLY EXCLUSIVE EVENTS

bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness

bull Remember

ndash Mutual exclusiveness is a

ndash Statistical independence is a

bull How does mutual exclusiveness relate to statistical independence

1 me rArr si

2 si rArr me

3 two events cannot be both si and me except in some degenerate cases

4 none of the above

bull Perhaps surprisingly is true

Consider the following

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 32: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L3-6

Conditional Probability Consider an example

EX EX Defective computers in a lab

A computer lab contains

bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective

A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer

Ω = AD AN BD BD BN

Let

bull EA be the event that the selected computer is from manufacturer A

bull EB be the event that the selected computer is from manufacturer B

bull ED be the event that the selected computer is defective

Find

P (EA) = P (EB) = P (ED) =

bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective

bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective

We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)

Ω = AD AN BD BD BN

Find

ndash P (ED|EB) =

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 33: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L3-7

ndash P (EA|ED) =

ndash P (EB|ED) =

We need a systematic way of determining probabilities given additional information about theexperiment outcome

CONDITIONAL PROBABILTY

Consider the Venn diagram

A

B

S

If we condition on B having occurred then we can form the new Venn diagram

B

A B

This diagram suggests that if A capB = empty then if B occurs A could not have occurred

Similarly if B sub A then if B occurs the diagram suggests that A must have occurred

A definition of conditional probability that agrees with these and other observations is

DEFN

For A isin A B isin A the conditional probability of event A given that event Boccurred is

P (A|B) =P (A capB)

P (B) for P (B) gt 0

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 34: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L3-8

Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))

Check the axioms

1

P (S|B) =P (S capB)

P (B)=

P (B)

P (B)= 1

2 Given A isin A

P (A|B) =P (A capB)

P (B)

and P (A capB) ge 0 P (B) ge 0

rArr P (A|B) ge 0

3 If A sub A C sub A and A cap C = empty

P (A cup C|B) =P [(A cup C) capB]

P [B]

=P [(A capB) cup (C capB)]

P [B]

Note that A cap C = empty rArr (A capB) cap (C capB) = empty so

P (A cup C|B) =P [A capB]

P [B]+

P [C capB]

P [B]

= P (A|B) + P (C|B)

EX Check prev example Ω =

AD AN BD BD BN

P (ED|EA) =

P (ED|EB) =

P (EA|ED) =

P (EB|ED) =

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 35: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L3-9

EX The genetics (eye color) problem

(a) What is the probability that Smith possesses a blue-eyed gene

(b) What is the probability that the Smithsrsquo first child will have blue eyes

(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 36: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L4-1

EEL 5544 Noise in Linear Systems Lecture 4

Motivation

Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)

Read Section 17 in the textbook

Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page

EX

(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin

2

Ex Drawing two consecutive balls without replacement

An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5

Ω = B1 B2 W3 W4 W5

Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw

Let Wi= event that white ball chosen on draw i

What is P (W2|W1)

2 (a) 13 (b) 15 (c) 1

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 37: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L4-2

STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY

bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring

bull Letrsquos evaluate that using conditional probability

bull If P (B) gt 0 and A and B are si then

(When A and B are si knowledge of B does not change the prob of A (and vice versa))

Relating Conditional and Unconditional Probs

Which of the following statements are true

(a) P (A|B) ge P (A)

(b) P (A|B) le P (A)

(c) Not necessarily (a) or (b)

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 38: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L4-3

Chain rule for expanding intersections

Note that P (A|B) =P (A capB)

P (B)

rArr P (A capB) = P (A|B)P (B) (7)

and P (B|A) =P (A capB)

P (A)

rArr P (A capB) = P (B|A)P (A) (8)

bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents

bull The chain rule can be easily generalized to more than two events

Ex Intersection of 3 events

P (A capB cap C) =P (A capB cap C)

P (B cap C)middot P (B cap C)

P (C)middot P (C)

= P (A|B cap C)P (B|C)P (C)

Partitions and Total Probability

DEFN

A collection of sets A1 A2 partitions the sample space Ω if and only if

Ai is also said to be a partition of Ω

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 39: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L4-4

1 Venn Diagram

2 Expressing an arbitrary set using a partition of Ω

Law of Total Probability

If Ai partitions Ω then

Example

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 40: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L4-5

EX Binary Communication System

A transmitter (Tx) sends A0 A1

bull A0 is sent with prob P (A0) = 25

bull A1 is sent with prob P (A1) = 35

bull A receiver (Rx) processes the output of the channel into one of two values B0 B1

bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1

and B0 B1

ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram

56A

B

B

0

1

0A

1

78

16

18

bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1

bull Letrsquos start by considering 2 possible decision rules

1 Always decide A1

2 When receive B0 decide A0 when receive B1 decide A1

Problem Determine the probability of error for these two decision rules

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 41: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L4-6

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 42: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L4-7

EX Optimal Detection for Binary Communication System

Design an optimal receiver to minimize error probability

bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received

bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted

P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)

bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η

bull Since we only apply hypothesis Hij when Bj is received we can write

P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)

+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)

bull We apply the chain rule to write this as

P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)

+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)

bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k

bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1

bull Then

P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)

(1minus q1)P (A1 capB1) + q0P (A1 capB0)

bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0

and q1 that minimize

(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)

bull We can further expand these using the chain rule as

(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 43: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L4-8

bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize

(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)

bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1

rArr Our decision rule is not randomized

bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise

bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise

bull These rules can be summarized as follows

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)

bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted

bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured

bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule

bull Problem we donrsquot know P (Ai|Bj)

bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 44: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L4-9

bull General technique

If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then

where the last step follows from the Law of Total Probability

The expression in the box is

EX

Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 45: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L4-10

bull Often we may not know the a posteriori probabilities P (A0) and P (A1)

bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05

bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes

Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)

bull This is known as the maximum likelihood (ML) decision rule

EX The gambler with the two coins

(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin

(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin

(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 46: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5a-1

EEL 5544 Noise in Linear Systems Lecture 5a

Motivation

In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky

Read Section 18 in the textbook

Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page

EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities

(a) P [no two alike]

(b) P [one pair]

(c) P [two pair]

(d) P [three alike]

(e) P [full house]

(f) P [four alike]

(g) P [five alike]

Note that a full house is three of one number plus a pair of another number3

COMBINATORICS

bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is

SPECIAL CASES

1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted

3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 47: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5a-2

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Thus n1 = n2 = = nk = |A| equiv n

rArr number of distinct ordered k-tuples is

2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n

Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k

Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)

rArr number of distinct ordered k-tuples is

3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)

Result is n-tuple

The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set

Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1

rArr of permutations = of distinct ordered n-tuples

=

=

Useful formula For large nn asymp

radic2πnn+ 1

2 eminusn

MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 48: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5a-3

4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order

Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo

First determine of ordered subsets

Now how many different orders are there for any set of k objects

Let Cnk = the number of combinations of size k from a set of n objects

Cnk =

ordered subsets orderings

=

DEFN

The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient

bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets

bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere

Using the multinomial coefficient there are

72

(3)24 asymp 13times 1085

Order matters because each group gets a different project

bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project

There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is

72

(3)24 (24)asymp 21times 1061 ordered groups

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 49: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5a-4

bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die

One possible way 1 1 2 2 3 3 4 4 5 5 6 6

ndash How many total ways can this occur

ndash What is the prob of this occurring

5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order

bull The total number of arrangements is(k + nminus 1

k

)

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 50: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5-1

EEL 5544 Noise in Linear Systems Lecture 5

Motivation

In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials

Read Sections 19ndash112 in the textbook

Try to answer the following question The answer appears at the bottom of the page

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

4

EX By Request The Game ShowMonty Hall Problem

Slightly paraphrased from the Ask Marilyn column of Parade magazine

bull Suppose yoursquore on a game show and yoursquore given the choice of three doors

ndash Behind one door is a car

ndash Behind the other doors are goats

bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat

bull The host then offers you the option to switch doors Does it matter if you switch

bull Compute the probabilities of winning for the following three strategies

1 Never switch

400794

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 51: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5-2

2 Always switch

3 Flip a fair coin and switch if it comes up heads

SEQUENTIAL EXPERIMENTS

DEFN

A sequential experiment (or combined experiment) consists of a series ofsubexperiments

Prob Space for N Sequential Experiments

1 The combined sample spaceΩ = theof the sample spaces for the subexperiments

bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi

2 Event class F σ-algebra on Ω

3 P (middot) on F such that

P (A) =

For si subexperiments

P (A) =

Bernoulli Trials

DEFN

A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 52: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5-3

bull Let p denote the probability of success

bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is

bull The number of ways that k success can occur in n places is

bull Thus the probability of k successes on n trials is

pn(k) = (11)

Examples

EX Toss a fair coin 3 times What is the probability of exactly 3 heads

EX

Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 53: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5-4

bull When n is large the probability in () can be difficult to compute as(

nk

)gets very large at

the same time that one or both of the other terms gets very small

For large n we can approximate the binomial probabilities using either

bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place

bull Gaussian approximation applies for large n and k See Section 111 and below

bull Let

b(k n p) =

(n

k

)pk(1minus p)nminusk (12)

bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials

bull An equivalent formulation is the following

ndash Consider sequence of variables Xi

ndash For each Bernoulli trial i let Xi = 1

ndash Let Sn =sumn

i=1 Xi

Then b(k n p) is the probability that Sn = k

bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable

bull The Gaussian random variable is characterized by a density function

φ(x) =1radic2π

exp

[minusx2

2

](13)

and distribution function

Φ(x) =1radic2π

int x

minusinfinexp

[minust2

2

]dt (14)

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 54: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5-5

bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by

Q(x) = 1minus Φ(x) (15)

=1radic2π

int infin

x

exp

[minust2

2

]dt (16)

bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)

bull By applying the Law of Large Numbers it can be shown that for large n

b(k n p) asymp 1radic

npqφ

(k minus npradic

npq

)(17)

bull Note that () uses the density function φ not the distribution function Φ

bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities

bull For fixed integers α and β

P [α le Sn le β] asymp Φ

[β minus np + 05

radicnpq

]minus Φ

[αminus npminus 05

radicnpq

]Q

[αminus npminus 05

radicnpq

]minusQ

[β minus np + 05

radicnpq

]

bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion

bull I will give you a Q-function table to use with each exam

GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required

p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap

mth trial is success)

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 55: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5-6

Then

p(m) =

Also P ( trials gt k) =

EX

A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions

THE POISSON LAW

Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this

bull How should we proceed What should we assume

bull Letrsquos start with a binomial model

bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05

bull Then an average of 30 calls come in per hour with a maximum of 60 callshour

bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour

bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 56: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5-7

bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120

bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n

bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero

bull For finite k the probability of k arrivals is a Binomial probability(n

k

)pk(1minus p)nminusk

bull If we let np = α be a constant then for large n(n

k

)pk(1minus p)nminusk asymp 1

kαk(1minus α

n

)nminusk

bull Then

limnrarrinfin

1

kαk(1minus α

n

)nminusk

=αk

keminusα

which is the Poisson law

bull The Poisson law gives the probability for events that occur randomly in space or time

bull Examples are

ndash calls coming in to a switching center

ndash packets arriving at a queue in a network

ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross

ndash of misprints on a group of pages in a book

ndash of people in a community that live to be 100 years old

ndash of wrong telephone numbers that are dialed in a day

ndash of α-particles discharged in a fixed period of time from some radioactive material

ndash of earthquakes per year

ndash of computer crashes in a lab in a week

The Poisson RV can be used to model all of these phenomena

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 57: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L5-8

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval

bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 58: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L6-1

EEL 5544 Lecture 6

Motivation

Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time

Read Sections 21ndash23 in the textbook

Try to answer the following question The answers are shown so that you can check your work

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

Ans

FZ(z) =

0 z le 0z2

2b2 0 lt z le b

b2minus (2bminusz)2

2

b2 b lt z lt 2b

1 z ge 2b

4 Specify the type (discrete continuous mixed) of Y (continuous)

RANDOM VARIABLES (RVS)

bull What is a random variable

bull We define a random variable is defined on a probability space (ΩF P ) as afrom to

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 59: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L6-2

EX

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 60: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L6-3

bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω

bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F

bull We will only assign probabilities to Borel sets of the real line

DEFN

A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB

∆= ξ isin Ω X(ξ) isin

B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0

bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]

bull Thus it is convenient to define a function that assigns probabilities to sets of this form

DEFN

If (ΩF P ) is a prob space with X(ω) a real RV on Ω the

( ) denoted is

bull FX(x) is a prob measure

bull Properties of FX

1 0 le FX(x) le 1

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 61: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L6-4

2 FX(minusinfin) = 0 and FX(infin) = 1

3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b

4 P (a lt X le b) = FX(b)minus FX(a)

5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)

(The value at a jump discontinuity is the value after the jump)

Pf omitted

bull If FX(x) is continuous function of x thenF (x) = F (xminus)

bull If FX(x) is not a continuous function of x then from above

FX(x)minus FX(xminus) = P [xminus lt X le x]

= limεrarr0

P [xminus ε lt X le x]

= P [X = x]

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 62: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L6-5

EX Examples

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 63: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L6-6

EX

A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands

1 Describe the sample space of Z

2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin

3 Find and plot FZ(z)

4 Specify the type (discrete continuous mixed) of Y (continuous)

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 64: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-1

EEL 5544 Lecture 7

Motivation

Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)

Discrete random variables can be characterized through their distribution and probability massfunctions

The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables

Read Sections 23ndash25 in the textbook

MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)

EX Try the following in MATLAB

Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)

Check that the density is uniform by plotting a histogramhist(u100)

Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)

Check the density by plotting a histogramhist(u100)

Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 65: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-2

Now letrsquos do one more transformationw=sqrt(v)

Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created

This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions

PROBABILITY DENSITY FUNCTION

DEFN

The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of

bull Properties

1 fX(x) ge 0minusinfin lt x lt infin

2 FX(x) =int x

minusinfin fX(t)dt

3int +infinminusinfin fX(x)dx = 1

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 66: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-3

4 P (a lt X le b) =int b

afX(x)dx a le b

5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin

minusinfing(x)dx = cminusinfin lt c lt +infin

then fX(x) = g(x)c

is a valid pdf

Pf omitted

bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0

bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely

DEFN

If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a

bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned

to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable

Uniform Continuous Random Variables

DEFN

For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by

fX(x) =

0 x lt a

1bminusa

a le x le b

0 x gt b

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 67: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-4

DEFN

A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function

PROBABILITY MASS FUNCTION

DEFN

For a discrete RV the probability mass function (pmf) is

EX Roll a fair 6-sided dieX= on top face

P (X = x) =

16 x = 1 2 6

0 ow

EX Flip a fair coin until heads occursX = of flips

P (X = x) =

(12

)x x = 1 2

0 ow

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 68: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-5

IMPORTANT RANDOM VARIABLES

Discrete RVs

1 Bernoulli RV

bull An event A isin A is considered a ldquosuccessrdquo

bull A Bernoulli RV X is defined by

X =

1 s isin A

0 s isin A

bull The pmf for a Bernoulli RV X can be found formally as

P (X = 1) = P

(X(s) = 1

)= P

(s|s isin A

)= P (A) p

So

P (X = x) =

p x = 1

1minus p x = 0

0 x 6= 0 1

2 Binomial RV

bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials

bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs

bull Let X= of successesThen the pmf of X is given by

P [X = k] =

(nk

)pk(1minus p)nminusk k = 0 1 n

0 ow

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 69: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-6

bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035Binomial PMF (10 02)

P(X

=x)

x0 1 2 3 4 5 6 7 8 9 10

0

005

01

015

02

025

x

P(X

=x)

Binomial (1005) pmf

bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below

0 1 2 3 4 5 6 7 8 9 1001

02

03

04

05

06

07

08

09

1

11Binomial (1002) CDF

x

F X(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Binomial (1005) CDF

x

F X(x

)

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 70: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-7

bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009

01Binomial(10002)

x

P(X

=x)

3 Geometric RV

bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess

bull X = number of trials requiredThen the pmf or X is given by

P [X = k] =

(1minus p)kminus1p k = 1 2 n

0 ow

bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below

1 2 3 4 5 6 7 8 9 10 110

001

002

003

004

005

006

007

008

009

01Geometric(01) pmf

x

P(X

=x)

1 2 3 4 5 6 7 8 9 10 110

01

02

03

04

05

06

07Geometric(07) pmf

x

P(X

=x)

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 71: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-8

bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below

0 5 10 15 20 25 30 35 40 4501

02

03

04

05

06

07

08

09

1Geometric (01) pmf

x

FX(x

)

1 2 3 4 5 6 7 8 9 10 11065

07

075

08

085

09

095

1Geometric(07) CDF

x

FX(x

)

4 Poisson RV

bull Models events that occur randomly in space or time

bull Let λ= the of events(unit of space or time)

bull Consider observing some period of time or space of length t and let α = λt

bull Let N= the events in time (or space) t

bull Then

P (N = k) =

αk

keminusα k = 0 1

0 ow

bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025

03

035

04

045

05Poisson (075) pmf

x

P(Y

=y)

0 1 2 3 4 5 6 7 8 9 100

005

01

015

02

025Poisson(3) pmf

x

FX(x

)

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 72: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-9

bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below

0 1 2 3 4 5 6 7 8 9 1004

05

06

07

08

09

1Poission(075) CDF

x

FX(x

)

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Poisson(3) CDF

x

FX(x

)

bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20

0 10 20 30 40 50 60 70 80 90 1000

001

002

003

004

005

006

007

008

009Poisson(20) pmf

x

P(X

=x)

IMPORTANT CONTINUOUS RVS

1 Uniform RVcovered in example

2 Exponential RV

bull Characterized by single parameter λ

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 73: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-10

bull Density function

fX(x) =

0 x lt 0

λeminusλx x ge 0

bull Distribution function

FX(x) =

0 x lt 0

1minus eminusλx x ge 0

bull The bookrsquos definition substitutes λ = 1micro

bull Obtainable as limit of geometric RV

bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below

0 1 2 3 4 5 6 7 8 9 100

05

1

15

2

25

3

35

4Exponential pdfs

x

f X(x

)

λ=025λ=1

λ=4

0 1 2 3 4 5 6 7 8 9 100

01

02

03

04

05

06

07

08

09

1Exponential CDFs

x

FX(x

)

α=025

α=1

α=4

3 Gaussian RV

bull For large n the binomial and Poisson RVs have a bell shape

bull Consider the binomial

bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n

k

)pkqnminusk asymp 1radic

2πnpqexp

minus(k minus np)2

2npq

bull Let σ2 = npq micro = np then

P ( k successes on n Bernoulli trials)

=

(n

k

)pkqnminusk

asymp 1radic2πσ2

exp

minus(k minus micro)2

2σ2

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 74: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-11

DEFN

A Gaussian random variable X has density function

fX(x) =1radic

2πσ2exp

minus(xminus micro)2

2σ2

with parameters (mean) micro and (variance) σ2 ge 0

bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)

bull The CDF of a Gaussian RV is given by

FX(x) = P (X le x)

=

int x

minusinfin

1radic2πσ2

exp

minus(tminus micro)2

2σ2

dt

which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =

5 σ2 = 1) and (micro = 10 σ2 = 25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08Gaussian pdfs

x

f X(x

)

(micro=15σ2=025)

(micro=5σ2=1)

(micro=10σ2=25)

0 2 4 6 8 10 12 14 16 18 200

01

02

03

04

05

06

07

08

09

1Gaussian cdfs

x

FX(x

)

(micro=15σ=025)

(micro=5σ2=1)

(micro=10σ2=25)

bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1

ndash The CDF for the normalized Gaussian RV is defined as

Φ(x) =

int x

minusinfin

1radic2π

exp

minust2

2

dt

ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by

Q(x) =

int infin

x

1radic2π

exp

minust2

2

dt

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 75: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L7-12

ndash Note that Q(x) = 1minus Φ(x)

ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)

ndash The Q-function can also be defined as

Q(x) =1

π

int π2

0

exp

minus x2

sin2 φ

(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)

bull The CDF for a Gaussian RV with mean micro and variance σ2 is

FX(x) = Φ

(xminus micro

σ

)= 1minusQ

(xminus micro

σ

)

Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests

bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob

P (a lt X le b) = P (X gt a)minus P (X gt b)

= Q

(aminus micro

σ

)minusQ

(bminus micro

σ

)

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 76: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L8-1

EEL 5544 Lectures 8

Motivation

Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise

In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable

Read Sections 26 through p 85 in the textbook

Try to answer the following question Answers appear at the bottom of the page

EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities

(a) P [X lt micro]

(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5

5

Examples with Gaussian Random Variables

EX The Motivation problem

5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 77: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L8-2

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 78: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L8-3

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 79: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L8-4

OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES

4 Laplacian RV

bull The Laplacian random variable is often used to model the difference betweencorrelated data sources

bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables

bull The density function for a Laplacian random variable is

fX(x) =c

2exp minusc|x| minusinfin lt x lt infin

where c gt 0

bull The pdf for a Laplacian RV with c = 2 is shown below

minus3 minus2 minus1 0 1 2 30

01

02

03

04

05

06

07

08

09

1Laplacian pdf

x

f X(x

)

5 Chi-square RV

bull Let X be a Gaussian random variable

bull Then Y = X2 is a chi-square random variable

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 80: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L8-5

bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then

Y =nsum

i=1

X2i (18)

is a chi-square random variable with n degrees of freedom

bull The chi-square random variable is usually classified as either central or non-central

bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable

bull The density of a central chi-square random variable is given by

fY (y) =

1

σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0

0 y lt 0

where Γ(p) is the gamma function defined asΓ(p) =

intinfin0

tpminus1eminustdt p gt 0

Γ(p) = (pminus 1) p an integer gt 0

Γ(12) =radic

π

Γ(32) = 12

radicπ

bull Note that for n = 2

fY (y) =

1

2σ2 eminusy(2σ2) y ge 0

0 y lt 0

which is an exponential density

bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential

bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 81: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L8-6

bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow

0 2 4 6 8 10 12 140

01

02

03

04

05

06

07

08

09

1chiminussquare density functions with n degrees of freedom σ2=1

x

f X(x

)n=1

n=2n=4

n=8

bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe

FY (y) =

1minus eminusy(2σ2)

summminus1k=0

1k

(y

2σ2

)k y ge 0

0 y lt 0

bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form

6 Rayleigh RV

bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance

bull Let R =radic

Y

bull Then R is a Rayleigh RV with pdf

fR(r) =

rσ2 e

minusr2(2σ2) r ge 0

0 r lt 0

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 82: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L8-7

bull The pdf for Rayleigh RV with σ2 = 1 is shown below

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07Rayleigh PDF (σ2=1)

r

f R(r

)

bull The CDF for a Rayleigh RV is

FR(r) =

1minus eminusr2(2σ2) r ge 0

0 r lt 0

bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable

6 Lognormal RV

bull Consider a random variable L such that X = ln L is a Gaussian RV

bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels

bull Then L is a lognormal random variable with pdf given by

fL(`) =

1

`radic

2πσ2exp

minus (ln `minusmicro)2

2σ2

` ge 0

0 ` lt 0

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 83: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L8-8

CONDITIONAL DISTRIBUTIONS AND DENSITIES

bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)

DEFN

For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais

fX(x|A) =

DEFN

For an event A isin F with P (A) 6= 0 the conditional density of X given A is

fX(x|A) =

bull Then FX(x|A) is a distribution function and fX(x|A) is a density function

EX

Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is

1 the conditional distribution for your total wait time

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 84: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L8-9

2 the conditional distribution for your remaining wait time

TOTAL PROBABILITY AND BAYESrsquo THEOREM

bull If A1 A2 An form a partition of Ω then

FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)

+ + FX(x|An)P (An)

and

fX(x) =nsum

i=1

fX(x|Ai)P (Ai)

Point Conditioning

bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable

bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work

P (A|X = x) = lim∆xrarr0

P (A|x lt X le x + ∆x)

= lim∆xrarr0

FX(x + ∆x|A)minus FX(x|A)

FX(x + ∆x)minus FX(x)P (A)

= lim∆xrarr0

FX(x+∆x|A)minusFX(x|A)∆x

FX(x+∆x)minusFX(x)∆x

P (A)

=fX(x|A)

fX(x)P (A)

if fX(x|A) and fX(x) exist

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board

Page 85: EEL 5544 - Noise in Linear Systems (Should be called ... · the student version of MATLAB, as departmental computer resources are limited. Not being able to get on a computer is not

L8-10

Implication of Point Conditioning

bull Note that

P (A|X = x) =fX(x|A)

fX(x)P (A)

rArr P (A|X = x)fX(x) = fX(x|A)P (A)

rArrint infin

minusinfinP (A|X = x)fX(x)dx =

int infin

minusinfinfX(x|A)dxP (A)

rArr P (A) =

int infin

minusinfinP (A|X = x)fX(x)dx

(Continuous Version of Law of Total Probability)

bull Once we have the Law of Total Probability it is easy to derive the

Continuous Version of Bayersquos Theorem

fX(x|A) =P (A|X = x)

P (A)fX(x)

=P (A|X = x)fX(x)intinfin

minusinfin P (A|X = t)fX(t)dt

Examples on board