Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
EEL 5544 - Noise in Linear Systems(Should be called ldquoProbability and Random Processes
for Electrical Engineersrdquo)
Dr John M Shea
Fall 2008
Pre-requisite Very strong mathematical skills Solid understanding of systems theory includingconvolution Fourier transforms and impulse functions Knowledge of basic linear algebraincluding matrix properties and eigen-decomposition
Computer requirement Some problems will require MATLAB Students may want to purchasethe student version of MATLAB as departmental computer resources are limited Not beingable to get on a computer is not a valid excuse for late work Web access with the ability torun Java programs is also required
Meeting Time 935ndash1025 MondayWednesdayFridayMeeting Room NEB 201Final Exam The third exam will be given during the final exam time slot determined by the
University which is December 18th at 1000 AM ndash 1200 noonE-mail jsheaeceufleduInstant messaging On AIM eel5544
Class Web page httpwirelesseceufledueel5544Personal Web page httpwirelesseceufledujsheaOffice 439 New Engineering BuildingPhone (352)846-3042Office hours 1040ndash1130 Mondays and 155ndash350 WednesdaysTextbook Henry Stark and John W Woods Probability and Random Processes with Applications
to Signal Processing Prentice Hall 3rd ed 2002 (ISBN 0-13-020071-9)Suggested References
bull If you feel like you are having a hard time with basic probability I suggestD P Bertsekas and J N Tsitsiklis Introduction to Probability 2nd ed 2008 (ISBN978-1-886529-23-6)
Sheldon Ross A First Course in Probability Prentice Hall 6th ed 2002 (ISBN 0-13-033851-6)
bull For more depth on filtering of random processesMichael B Pursley Random Processes in Linear Systems Prentice Hall 2002 (ISBN0-13-067391-9)
S-1
S-2
Other References
bull Alberto Leon-Garcia Probability and Random Processes for Electrical EngineeringPrentice-Hall (was Addison-Wesley) 2nd ed 1994 (ISBN 0-201-50037-X)
bull Athanasios Papoulis and S Unnikrishna Pillai Probability Random Variables andStochastic Processes McGraw Hill 4th ed 2002 (ISBN 0-07-112256-7)
Grader Byonghyok Choi (aru0080ufledu)
Course Topics (as time allows)
bull Probability spaces and axioms of probability
bull Combinatorial (counting) analysis
bull Conditional probability total probability and Bayersquos rule
bull Statistical independence
bull Sequential experiments
bull Single random variables and types of random variables
bull Important random variables
bull Distribution and density functions
bull Functions of one random variable
bull Expected value of a random variable
bull Mean variance standard deviation Nth moment Nth central moment
bull Markov and Chebyshev inequalities
bull Transform methods Characteristic and generating functions Laplace transform
bull Generating random variables
bull Multiple random variables
bull Joint and marginal distribution and density functions
bull Functions of several random variables
bull Joint moments and joint characteristic functions
bull Conditional expected value
bull Laws of large numbers and the central limit theorem
bull Random processes
bull Mean autocorrelation and autocovariance functions
bull Stationarity
bull Time-invariant filtering of random processes
bull Power spectral density
bull Matched filters and Weiner optimum systems
S-3
bull Markov chains
bull Examples of practical applications of the theory
Goals and Objectives Upon completion of this course the student should be able to
bull Recite the axioms of probability use the axioms and their corrolaries to give reasonableanswers
bull Determine probabilities based on counting (lottery tickets etc)
bull Calculate probabilities of events from the density or distribution functions for randomvariables
bull Classify random variables based on their density or distribution functions
bull Know the density and distribution functions for common random variables
bull Determine random variables from definitions based on the underlying probability space
bull Determine the density and distribution functions for functions of random variablesusing several different techniques presented in class
bull Calculate expected values for random variables
bull Determine whether events random variables or random processes are statisticallyindependent
bull Use inequalities to find bounds for probabilities that might otherwise be difficult toevaluate
bull Use transform methods to simplify solving some problems that would otherwise bedifficult
bull Evaluate probabilities involving multiple random variables or functions of multiplerandom variables
bull Classify random processes based on their time support and value support
bull Simulate random variables and random processes
bull Classify random processes based on stationarity
bull Evaluate the mean autocovariance and autocorrelation functions for random processesat the output of a linear filter
bull Evaluate the power spectral density for wide-sense stationary random processes
bull Give the matched filter solution for a simple signal transmitted in additive white Gaussiannoise
bull Determine the steady state probabilities for a Markov chain
Grading Grading for on-campus students will be based on three exams (30 each) and classparticipation and quizzes (10)Grading for EDGE students will be based on three exams(25 each) homework (15) and class participation and quizzes (10) The participationscore will take into account in-class participation e-mail or instant messaging exchanges
S-4
discussions outside of class etc I will also give unannounced quizzes that will count towardthe participation grade A grade of gt 90 is guaranteed an A gt 80 is guaranteed a B etc
Because of limited grading support most homework sets will not be graded for on-campusstudents Homework sets for off-campus students will be graded on a a spot-check basis ifI give ten problems I may only ask the grader to check four or five of them Homework willbe accepted late up to two times with an automatic 25 reduction in grade
No formal project is required but as mention above students will be required to use MATLABin solving some homework problems
When students request that a submission (test or homework) be regraded I reserve the rightto regrade the entire submission rather than just a single problem
Attendance Attendance is not mandatory However students are expected to know all materialcovered in class even if it is not in the book Furthermore the instructor reserves the right togive unannounced ldquopoprdquo quizzes with no make-up option Students who miss such quizzeswill receive zeros for that grade If an exam must be missed the student must see theinstructor and make arrangements in advance unless an emergency makes this impossibleApproval for make-up exams is much more likely if the student is willing to take the examearly
Academic Honesty
All students admitted to the University of Florida have signed a statement of academic honestycommitting themselves to be honest in all academic work and understanding that failure to complywith this commitment will result in disciplinary action
This statement is a reminder to uphold your obligation as a student at the University of Floridaand to be honest in all work submitted and exams taken in this class and all others
Honor statements on tests must be signed in order to receive any credit for that test
I understand that many of you will have access to at least some of the homework solutionsTime constraints prohibit me from developing completely new sets of homework problems eachsemester Therefore I can only tell you that homework problems exist for your benefit It isdishonest to turn in work that is not your own In creating your homework solution you should notuse the homework solution that I created in a previous year or someone elsersquos homework solutionIf I suspect that too many people are turning in work that is not their own then I will completelyremove homework from the course grade
Collaboration on homework is permitted unless explicitly prohibited provided that
1 Collaboration is restricted to students currently in this course
2 Collaboration must be a shared effort
3 Each student must write up hisher homework independently
S-5
4 On problems involving MATLAB programs each student should write their own programStudents may discuss the implementations of the program but students should not work asa group in writing the programs
I have a zero-tolerance policy for cheating in this class If you talk to anyone other than meduring an exam I will give you a zero If you plagiarize (copy someone elsersquos words) or otherwisecopy someone elsersquos work I will give you a failing grade for the class Furthermore I will beforced to bring academic dishonesty charges against anyone who violates the Honor Code
L1-1
EEL 5544 Noise in Linear Systems Lecture 1
RANDOM EXPERIMENTS
Motivation
The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes
Read in the textbook from the Preface to Section 14
Try to answer the following problems
1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face
2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll
These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting
Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment
L1-2
What is a random experiment
QWhat do we mean by random
Output is unpredictable in some sense
DEFN
A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions
A random experiment is specified by
1 an experimental procedure2 a set of outcomes and measurement restrictions
DEFN
An outcome of a random experiment is a result that cannot be decomposed intoother results
DEFN
The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω
EX
Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded
bull Q What are the outcomes
Ω =
EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed
bull QWhat are the outcomes
Ω =
EX A 6-sided die is rolled and whether the top face is even or odd is observed
bull QWhat are the outcomes
Ω =
L1-3
If the outcome of an experiment is random how can we apply quantitative techniques to it
This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed
in the book in Section 12
1 Probability as Intuition
2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
3 Probability as a Measure of Frequency of Occurrence
4 Probability Based on Axiomatic Theory
Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
bull Some experiments are said to be ldquofairrdquo
ndash For a fair coin or a fair die toss the outcomes are equally likely
ndash Equally likely outcomes are common in many situations
bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis
bull Given an experiment with equally likely outcomes we can determine the probabilities easily
bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes
ndash An outcome or set of outcomes with probability 1 (one) is certain to occur
ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur
ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points
bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes
EX Tossing a fair coin
Let pH = Prob heads pT = Prob tails
Then pH + pT = 1(the probability that something occurs is 1)
Since pH = pT pH = pT = 12
EX Rolling a fair 6-sided die
L1-4
Let pi = Prob top face is iThen
6sumi=1
pi = 1
rArr 6p1 = 1
rArr p1 =1
6
So pi = 16
for i = 1 2 3 4 5 6
bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes
EX Rolling a fair 6-sided die
What is Prob 1 or 2
The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes
We can use equally-likely outcomes on repeated trials but we have to be careful
EX Rolling a fair 6-sided dice twice
Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die
L1-5
What are some problems with defining probability in this way
1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena
2 Can only deal with a finite number of outcomesndash this again limits its usefulness
Probability as a Measure of Frequency of Occurrence
bull Consider a random experiment that has K possible outcomes K lt infin
bull Let Nk(n) = the number of times the outcome is k
bull Then we can tabulate Nk(n) for various values of k and n
bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo
DEFN
The relative frequency of outcome k of a random experiment is
fk(n) =Nk(n)
n
bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue
DEFN
An experiment possesses statistical regularity if
limnrarrinfin
fk(n) = pk (a constant)forallk
DEFN
For experiments with statistical regularity as defined above pk is called theprobability of outcome k
PROPERTIES OF RELATIVE FREQUENCYbull Note that
0 le Nk(n) le n forallk
because Nk(n) is just the of times outcome k occurs in n trials
Dividing by n yields
0 le Nk(n)
n= fk(n) le 1 forallk = 1 K (1)
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
S-2
Other References
bull Alberto Leon-Garcia Probability and Random Processes for Electrical EngineeringPrentice-Hall (was Addison-Wesley) 2nd ed 1994 (ISBN 0-201-50037-X)
bull Athanasios Papoulis and S Unnikrishna Pillai Probability Random Variables andStochastic Processes McGraw Hill 4th ed 2002 (ISBN 0-07-112256-7)
Grader Byonghyok Choi (aru0080ufledu)
Course Topics (as time allows)
bull Probability spaces and axioms of probability
bull Combinatorial (counting) analysis
bull Conditional probability total probability and Bayersquos rule
bull Statistical independence
bull Sequential experiments
bull Single random variables and types of random variables
bull Important random variables
bull Distribution and density functions
bull Functions of one random variable
bull Expected value of a random variable
bull Mean variance standard deviation Nth moment Nth central moment
bull Markov and Chebyshev inequalities
bull Transform methods Characteristic and generating functions Laplace transform
bull Generating random variables
bull Multiple random variables
bull Joint and marginal distribution and density functions
bull Functions of several random variables
bull Joint moments and joint characteristic functions
bull Conditional expected value
bull Laws of large numbers and the central limit theorem
bull Random processes
bull Mean autocorrelation and autocovariance functions
bull Stationarity
bull Time-invariant filtering of random processes
bull Power spectral density
bull Matched filters and Weiner optimum systems
S-3
bull Markov chains
bull Examples of practical applications of the theory
Goals and Objectives Upon completion of this course the student should be able to
bull Recite the axioms of probability use the axioms and their corrolaries to give reasonableanswers
bull Determine probabilities based on counting (lottery tickets etc)
bull Calculate probabilities of events from the density or distribution functions for randomvariables
bull Classify random variables based on their density or distribution functions
bull Know the density and distribution functions for common random variables
bull Determine random variables from definitions based on the underlying probability space
bull Determine the density and distribution functions for functions of random variablesusing several different techniques presented in class
bull Calculate expected values for random variables
bull Determine whether events random variables or random processes are statisticallyindependent
bull Use inequalities to find bounds for probabilities that might otherwise be difficult toevaluate
bull Use transform methods to simplify solving some problems that would otherwise bedifficult
bull Evaluate probabilities involving multiple random variables or functions of multiplerandom variables
bull Classify random processes based on their time support and value support
bull Simulate random variables and random processes
bull Classify random processes based on stationarity
bull Evaluate the mean autocovariance and autocorrelation functions for random processesat the output of a linear filter
bull Evaluate the power spectral density for wide-sense stationary random processes
bull Give the matched filter solution for a simple signal transmitted in additive white Gaussiannoise
bull Determine the steady state probabilities for a Markov chain
Grading Grading for on-campus students will be based on three exams (30 each) and classparticipation and quizzes (10)Grading for EDGE students will be based on three exams(25 each) homework (15) and class participation and quizzes (10) The participationscore will take into account in-class participation e-mail or instant messaging exchanges
S-4
discussions outside of class etc I will also give unannounced quizzes that will count towardthe participation grade A grade of gt 90 is guaranteed an A gt 80 is guaranteed a B etc
Because of limited grading support most homework sets will not be graded for on-campusstudents Homework sets for off-campus students will be graded on a a spot-check basis ifI give ten problems I may only ask the grader to check four or five of them Homework willbe accepted late up to two times with an automatic 25 reduction in grade
No formal project is required but as mention above students will be required to use MATLABin solving some homework problems
When students request that a submission (test or homework) be regraded I reserve the rightto regrade the entire submission rather than just a single problem
Attendance Attendance is not mandatory However students are expected to know all materialcovered in class even if it is not in the book Furthermore the instructor reserves the right togive unannounced ldquopoprdquo quizzes with no make-up option Students who miss such quizzeswill receive zeros for that grade If an exam must be missed the student must see theinstructor and make arrangements in advance unless an emergency makes this impossibleApproval for make-up exams is much more likely if the student is willing to take the examearly
Academic Honesty
All students admitted to the University of Florida have signed a statement of academic honestycommitting themselves to be honest in all academic work and understanding that failure to complywith this commitment will result in disciplinary action
This statement is a reminder to uphold your obligation as a student at the University of Floridaand to be honest in all work submitted and exams taken in this class and all others
Honor statements on tests must be signed in order to receive any credit for that test
I understand that many of you will have access to at least some of the homework solutionsTime constraints prohibit me from developing completely new sets of homework problems eachsemester Therefore I can only tell you that homework problems exist for your benefit It isdishonest to turn in work that is not your own In creating your homework solution you should notuse the homework solution that I created in a previous year or someone elsersquos homework solutionIf I suspect that too many people are turning in work that is not their own then I will completelyremove homework from the course grade
Collaboration on homework is permitted unless explicitly prohibited provided that
1 Collaboration is restricted to students currently in this course
2 Collaboration must be a shared effort
3 Each student must write up hisher homework independently
S-5
4 On problems involving MATLAB programs each student should write their own programStudents may discuss the implementations of the program but students should not work asa group in writing the programs
I have a zero-tolerance policy for cheating in this class If you talk to anyone other than meduring an exam I will give you a zero If you plagiarize (copy someone elsersquos words) or otherwisecopy someone elsersquos work I will give you a failing grade for the class Furthermore I will beforced to bring academic dishonesty charges against anyone who violates the Honor Code
L1-1
EEL 5544 Noise in Linear Systems Lecture 1
RANDOM EXPERIMENTS
Motivation
The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes
Read in the textbook from the Preface to Section 14
Try to answer the following problems
1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face
2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll
These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting
Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment
L1-2
What is a random experiment
QWhat do we mean by random
Output is unpredictable in some sense
DEFN
A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions
A random experiment is specified by
1 an experimental procedure2 a set of outcomes and measurement restrictions
DEFN
An outcome of a random experiment is a result that cannot be decomposed intoother results
DEFN
The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω
EX
Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded
bull Q What are the outcomes
Ω =
EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed
bull QWhat are the outcomes
Ω =
EX A 6-sided die is rolled and whether the top face is even or odd is observed
bull QWhat are the outcomes
Ω =
L1-3
If the outcome of an experiment is random how can we apply quantitative techniques to it
This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed
in the book in Section 12
1 Probability as Intuition
2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
3 Probability as a Measure of Frequency of Occurrence
4 Probability Based on Axiomatic Theory
Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
bull Some experiments are said to be ldquofairrdquo
ndash For a fair coin or a fair die toss the outcomes are equally likely
ndash Equally likely outcomes are common in many situations
bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis
bull Given an experiment with equally likely outcomes we can determine the probabilities easily
bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes
ndash An outcome or set of outcomes with probability 1 (one) is certain to occur
ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur
ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points
bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes
EX Tossing a fair coin
Let pH = Prob heads pT = Prob tails
Then pH + pT = 1(the probability that something occurs is 1)
Since pH = pT pH = pT = 12
EX Rolling a fair 6-sided die
L1-4
Let pi = Prob top face is iThen
6sumi=1
pi = 1
rArr 6p1 = 1
rArr p1 =1
6
So pi = 16
for i = 1 2 3 4 5 6
bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes
EX Rolling a fair 6-sided die
What is Prob 1 or 2
The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes
We can use equally-likely outcomes on repeated trials but we have to be careful
EX Rolling a fair 6-sided dice twice
Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die
L1-5
What are some problems with defining probability in this way
1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena
2 Can only deal with a finite number of outcomesndash this again limits its usefulness
Probability as a Measure of Frequency of Occurrence
bull Consider a random experiment that has K possible outcomes K lt infin
bull Let Nk(n) = the number of times the outcome is k
bull Then we can tabulate Nk(n) for various values of k and n
bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo
DEFN
The relative frequency of outcome k of a random experiment is
fk(n) =Nk(n)
n
bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue
DEFN
An experiment possesses statistical regularity if
limnrarrinfin
fk(n) = pk (a constant)forallk
DEFN
For experiments with statistical regularity as defined above pk is called theprobability of outcome k
PROPERTIES OF RELATIVE FREQUENCYbull Note that
0 le Nk(n) le n forallk
because Nk(n) is just the of times outcome k occurs in n trials
Dividing by n yields
0 le Nk(n)
n= fk(n) le 1 forallk = 1 K (1)
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
S-3
bull Markov chains
bull Examples of practical applications of the theory
Goals and Objectives Upon completion of this course the student should be able to
bull Recite the axioms of probability use the axioms and their corrolaries to give reasonableanswers
bull Determine probabilities based on counting (lottery tickets etc)
bull Calculate probabilities of events from the density or distribution functions for randomvariables
bull Classify random variables based on their density or distribution functions
bull Know the density and distribution functions for common random variables
bull Determine random variables from definitions based on the underlying probability space
bull Determine the density and distribution functions for functions of random variablesusing several different techniques presented in class
bull Calculate expected values for random variables
bull Determine whether events random variables or random processes are statisticallyindependent
bull Use inequalities to find bounds for probabilities that might otherwise be difficult toevaluate
bull Use transform methods to simplify solving some problems that would otherwise bedifficult
bull Evaluate probabilities involving multiple random variables or functions of multiplerandom variables
bull Classify random processes based on their time support and value support
bull Simulate random variables and random processes
bull Classify random processes based on stationarity
bull Evaluate the mean autocovariance and autocorrelation functions for random processesat the output of a linear filter
bull Evaluate the power spectral density for wide-sense stationary random processes
bull Give the matched filter solution for a simple signal transmitted in additive white Gaussiannoise
bull Determine the steady state probabilities for a Markov chain
Grading Grading for on-campus students will be based on three exams (30 each) and classparticipation and quizzes (10)Grading for EDGE students will be based on three exams(25 each) homework (15) and class participation and quizzes (10) The participationscore will take into account in-class participation e-mail or instant messaging exchanges
S-4
discussions outside of class etc I will also give unannounced quizzes that will count towardthe participation grade A grade of gt 90 is guaranteed an A gt 80 is guaranteed a B etc
Because of limited grading support most homework sets will not be graded for on-campusstudents Homework sets for off-campus students will be graded on a a spot-check basis ifI give ten problems I may only ask the grader to check four or five of them Homework willbe accepted late up to two times with an automatic 25 reduction in grade
No formal project is required but as mention above students will be required to use MATLABin solving some homework problems
When students request that a submission (test or homework) be regraded I reserve the rightto regrade the entire submission rather than just a single problem
Attendance Attendance is not mandatory However students are expected to know all materialcovered in class even if it is not in the book Furthermore the instructor reserves the right togive unannounced ldquopoprdquo quizzes with no make-up option Students who miss such quizzeswill receive zeros for that grade If an exam must be missed the student must see theinstructor and make arrangements in advance unless an emergency makes this impossibleApproval for make-up exams is much more likely if the student is willing to take the examearly
Academic Honesty
All students admitted to the University of Florida have signed a statement of academic honestycommitting themselves to be honest in all academic work and understanding that failure to complywith this commitment will result in disciplinary action
This statement is a reminder to uphold your obligation as a student at the University of Floridaand to be honest in all work submitted and exams taken in this class and all others
Honor statements on tests must be signed in order to receive any credit for that test
I understand that many of you will have access to at least some of the homework solutionsTime constraints prohibit me from developing completely new sets of homework problems eachsemester Therefore I can only tell you that homework problems exist for your benefit It isdishonest to turn in work that is not your own In creating your homework solution you should notuse the homework solution that I created in a previous year or someone elsersquos homework solutionIf I suspect that too many people are turning in work that is not their own then I will completelyremove homework from the course grade
Collaboration on homework is permitted unless explicitly prohibited provided that
1 Collaboration is restricted to students currently in this course
2 Collaboration must be a shared effort
3 Each student must write up hisher homework independently
S-5
4 On problems involving MATLAB programs each student should write their own programStudents may discuss the implementations of the program but students should not work asa group in writing the programs
I have a zero-tolerance policy for cheating in this class If you talk to anyone other than meduring an exam I will give you a zero If you plagiarize (copy someone elsersquos words) or otherwisecopy someone elsersquos work I will give you a failing grade for the class Furthermore I will beforced to bring academic dishonesty charges against anyone who violates the Honor Code
L1-1
EEL 5544 Noise in Linear Systems Lecture 1
RANDOM EXPERIMENTS
Motivation
The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes
Read in the textbook from the Preface to Section 14
Try to answer the following problems
1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face
2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll
These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting
Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment
L1-2
What is a random experiment
QWhat do we mean by random
Output is unpredictable in some sense
DEFN
A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions
A random experiment is specified by
1 an experimental procedure2 a set of outcomes and measurement restrictions
DEFN
An outcome of a random experiment is a result that cannot be decomposed intoother results
DEFN
The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω
EX
Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded
bull Q What are the outcomes
Ω =
EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed
bull QWhat are the outcomes
Ω =
EX A 6-sided die is rolled and whether the top face is even or odd is observed
bull QWhat are the outcomes
Ω =
L1-3
If the outcome of an experiment is random how can we apply quantitative techniques to it
This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed
in the book in Section 12
1 Probability as Intuition
2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
3 Probability as a Measure of Frequency of Occurrence
4 Probability Based on Axiomatic Theory
Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
bull Some experiments are said to be ldquofairrdquo
ndash For a fair coin or a fair die toss the outcomes are equally likely
ndash Equally likely outcomes are common in many situations
bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis
bull Given an experiment with equally likely outcomes we can determine the probabilities easily
bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes
ndash An outcome or set of outcomes with probability 1 (one) is certain to occur
ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur
ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points
bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes
EX Tossing a fair coin
Let pH = Prob heads pT = Prob tails
Then pH + pT = 1(the probability that something occurs is 1)
Since pH = pT pH = pT = 12
EX Rolling a fair 6-sided die
L1-4
Let pi = Prob top face is iThen
6sumi=1
pi = 1
rArr 6p1 = 1
rArr p1 =1
6
So pi = 16
for i = 1 2 3 4 5 6
bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes
EX Rolling a fair 6-sided die
What is Prob 1 or 2
The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes
We can use equally-likely outcomes on repeated trials but we have to be careful
EX Rolling a fair 6-sided dice twice
Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die
L1-5
What are some problems with defining probability in this way
1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena
2 Can only deal with a finite number of outcomesndash this again limits its usefulness
Probability as a Measure of Frequency of Occurrence
bull Consider a random experiment that has K possible outcomes K lt infin
bull Let Nk(n) = the number of times the outcome is k
bull Then we can tabulate Nk(n) for various values of k and n
bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo
DEFN
The relative frequency of outcome k of a random experiment is
fk(n) =Nk(n)
n
bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue
DEFN
An experiment possesses statistical regularity if
limnrarrinfin
fk(n) = pk (a constant)forallk
DEFN
For experiments with statistical regularity as defined above pk is called theprobability of outcome k
PROPERTIES OF RELATIVE FREQUENCYbull Note that
0 le Nk(n) le n forallk
because Nk(n) is just the of times outcome k occurs in n trials
Dividing by n yields
0 le Nk(n)
n= fk(n) le 1 forallk = 1 K (1)
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
S-4
discussions outside of class etc I will also give unannounced quizzes that will count towardthe participation grade A grade of gt 90 is guaranteed an A gt 80 is guaranteed a B etc
Because of limited grading support most homework sets will not be graded for on-campusstudents Homework sets for off-campus students will be graded on a a spot-check basis ifI give ten problems I may only ask the grader to check four or five of them Homework willbe accepted late up to two times with an automatic 25 reduction in grade
No formal project is required but as mention above students will be required to use MATLABin solving some homework problems
When students request that a submission (test or homework) be regraded I reserve the rightto regrade the entire submission rather than just a single problem
Attendance Attendance is not mandatory However students are expected to know all materialcovered in class even if it is not in the book Furthermore the instructor reserves the right togive unannounced ldquopoprdquo quizzes with no make-up option Students who miss such quizzeswill receive zeros for that grade If an exam must be missed the student must see theinstructor and make arrangements in advance unless an emergency makes this impossibleApproval for make-up exams is much more likely if the student is willing to take the examearly
Academic Honesty
All students admitted to the University of Florida have signed a statement of academic honestycommitting themselves to be honest in all academic work and understanding that failure to complywith this commitment will result in disciplinary action
This statement is a reminder to uphold your obligation as a student at the University of Floridaand to be honest in all work submitted and exams taken in this class and all others
Honor statements on tests must be signed in order to receive any credit for that test
I understand that many of you will have access to at least some of the homework solutionsTime constraints prohibit me from developing completely new sets of homework problems eachsemester Therefore I can only tell you that homework problems exist for your benefit It isdishonest to turn in work that is not your own In creating your homework solution you should notuse the homework solution that I created in a previous year or someone elsersquos homework solutionIf I suspect that too many people are turning in work that is not their own then I will completelyremove homework from the course grade
Collaboration on homework is permitted unless explicitly prohibited provided that
1 Collaboration is restricted to students currently in this course
2 Collaboration must be a shared effort
3 Each student must write up hisher homework independently
S-5
4 On problems involving MATLAB programs each student should write their own programStudents may discuss the implementations of the program but students should not work asa group in writing the programs
I have a zero-tolerance policy for cheating in this class If you talk to anyone other than meduring an exam I will give you a zero If you plagiarize (copy someone elsersquos words) or otherwisecopy someone elsersquos work I will give you a failing grade for the class Furthermore I will beforced to bring academic dishonesty charges against anyone who violates the Honor Code
L1-1
EEL 5544 Noise in Linear Systems Lecture 1
RANDOM EXPERIMENTS
Motivation
The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes
Read in the textbook from the Preface to Section 14
Try to answer the following problems
1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face
2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll
These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting
Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment
L1-2
What is a random experiment
QWhat do we mean by random
Output is unpredictable in some sense
DEFN
A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions
A random experiment is specified by
1 an experimental procedure2 a set of outcomes and measurement restrictions
DEFN
An outcome of a random experiment is a result that cannot be decomposed intoother results
DEFN
The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω
EX
Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded
bull Q What are the outcomes
Ω =
EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed
bull QWhat are the outcomes
Ω =
EX A 6-sided die is rolled and whether the top face is even or odd is observed
bull QWhat are the outcomes
Ω =
L1-3
If the outcome of an experiment is random how can we apply quantitative techniques to it
This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed
in the book in Section 12
1 Probability as Intuition
2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
3 Probability as a Measure of Frequency of Occurrence
4 Probability Based on Axiomatic Theory
Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
bull Some experiments are said to be ldquofairrdquo
ndash For a fair coin or a fair die toss the outcomes are equally likely
ndash Equally likely outcomes are common in many situations
bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis
bull Given an experiment with equally likely outcomes we can determine the probabilities easily
bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes
ndash An outcome or set of outcomes with probability 1 (one) is certain to occur
ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur
ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points
bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes
EX Tossing a fair coin
Let pH = Prob heads pT = Prob tails
Then pH + pT = 1(the probability that something occurs is 1)
Since pH = pT pH = pT = 12
EX Rolling a fair 6-sided die
L1-4
Let pi = Prob top face is iThen
6sumi=1
pi = 1
rArr 6p1 = 1
rArr p1 =1
6
So pi = 16
for i = 1 2 3 4 5 6
bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes
EX Rolling a fair 6-sided die
What is Prob 1 or 2
The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes
We can use equally-likely outcomes on repeated trials but we have to be careful
EX Rolling a fair 6-sided dice twice
Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die
L1-5
What are some problems with defining probability in this way
1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena
2 Can only deal with a finite number of outcomesndash this again limits its usefulness
Probability as a Measure of Frequency of Occurrence
bull Consider a random experiment that has K possible outcomes K lt infin
bull Let Nk(n) = the number of times the outcome is k
bull Then we can tabulate Nk(n) for various values of k and n
bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo
DEFN
The relative frequency of outcome k of a random experiment is
fk(n) =Nk(n)
n
bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue
DEFN
An experiment possesses statistical regularity if
limnrarrinfin
fk(n) = pk (a constant)forallk
DEFN
For experiments with statistical regularity as defined above pk is called theprobability of outcome k
PROPERTIES OF RELATIVE FREQUENCYbull Note that
0 le Nk(n) le n forallk
because Nk(n) is just the of times outcome k occurs in n trials
Dividing by n yields
0 le Nk(n)
n= fk(n) le 1 forallk = 1 K (1)
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
S-5
4 On problems involving MATLAB programs each student should write their own programStudents may discuss the implementations of the program but students should not work asa group in writing the programs
I have a zero-tolerance policy for cheating in this class If you talk to anyone other than meduring an exam I will give you a zero If you plagiarize (copy someone elsersquos words) or otherwisecopy someone elsersquos work I will give you a failing grade for the class Furthermore I will beforced to bring academic dishonesty charges against anyone who violates the Honor Code
L1-1
EEL 5544 Noise in Linear Systems Lecture 1
RANDOM EXPERIMENTS
Motivation
The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes
Read in the textbook from the Preface to Section 14
Try to answer the following problems
1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face
2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll
These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting
Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment
L1-2
What is a random experiment
QWhat do we mean by random
Output is unpredictable in some sense
DEFN
A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions
A random experiment is specified by
1 an experimental procedure2 a set of outcomes and measurement restrictions
DEFN
An outcome of a random experiment is a result that cannot be decomposed intoother results
DEFN
The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω
EX
Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded
bull Q What are the outcomes
Ω =
EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed
bull QWhat are the outcomes
Ω =
EX A 6-sided die is rolled and whether the top face is even or odd is observed
bull QWhat are the outcomes
Ω =
L1-3
If the outcome of an experiment is random how can we apply quantitative techniques to it
This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed
in the book in Section 12
1 Probability as Intuition
2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
3 Probability as a Measure of Frequency of Occurrence
4 Probability Based on Axiomatic Theory
Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
bull Some experiments are said to be ldquofairrdquo
ndash For a fair coin or a fair die toss the outcomes are equally likely
ndash Equally likely outcomes are common in many situations
bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis
bull Given an experiment with equally likely outcomes we can determine the probabilities easily
bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes
ndash An outcome or set of outcomes with probability 1 (one) is certain to occur
ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur
ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points
bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes
EX Tossing a fair coin
Let pH = Prob heads pT = Prob tails
Then pH + pT = 1(the probability that something occurs is 1)
Since pH = pT pH = pT = 12
EX Rolling a fair 6-sided die
L1-4
Let pi = Prob top face is iThen
6sumi=1
pi = 1
rArr 6p1 = 1
rArr p1 =1
6
So pi = 16
for i = 1 2 3 4 5 6
bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes
EX Rolling a fair 6-sided die
What is Prob 1 or 2
The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes
We can use equally-likely outcomes on repeated trials but we have to be careful
EX Rolling a fair 6-sided dice twice
Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die
L1-5
What are some problems with defining probability in this way
1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena
2 Can only deal with a finite number of outcomesndash this again limits its usefulness
Probability as a Measure of Frequency of Occurrence
bull Consider a random experiment that has K possible outcomes K lt infin
bull Let Nk(n) = the number of times the outcome is k
bull Then we can tabulate Nk(n) for various values of k and n
bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo
DEFN
The relative frequency of outcome k of a random experiment is
fk(n) =Nk(n)
n
bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue
DEFN
An experiment possesses statistical regularity if
limnrarrinfin
fk(n) = pk (a constant)forallk
DEFN
For experiments with statistical regularity as defined above pk is called theprobability of outcome k
PROPERTIES OF RELATIVE FREQUENCYbull Note that
0 le Nk(n) le n forallk
because Nk(n) is just the of times outcome k occurs in n trials
Dividing by n yields
0 le Nk(n)
n= fk(n) le 1 forallk = 1 K (1)
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L1-1
EEL 5544 Noise in Linear Systems Lecture 1
RANDOM EXPERIMENTS
Motivation
The most fundamental problems in probability are based on assuming that the outcomes of anexperiment are equally likely (for instance we often say that a coin or die is ldquofairrdquo We willshow that under this assumption the probability of an event is the ratio of the number of possibleoutcomes in the event to the total number of possible outcomes
Read in the textbook from the Preface to Section 14
Try to answer the following problems
1 EXA fair six-sided die is rolled What is the probability of observingeither a 1 or a 2 on the top face
2 EXA fair six-sided die is rolled twice What is the probability ofobserving either a 1 or a 2 on the top face on either roll
These problems seem similar but serve to illustrate the difference between outcomes and eventsRefer to the book and the lecture notes below for the differences Later we will reinterpret thisproblem in terms of mutually exclusive events and independent events However try not to solvethese problems using any formula you learned previously based on independent events ndash try tosolve them using counting
Hint Write down all possible outcomes for each experiment There are 6 for the first experimentand 36 for the second experiment
L1-2
What is a random experiment
QWhat do we mean by random
Output is unpredictable in some sense
DEFN
A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions
A random experiment is specified by
1 an experimental procedure2 a set of outcomes and measurement restrictions
DEFN
An outcome of a random experiment is a result that cannot be decomposed intoother results
DEFN
The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω
EX
Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded
bull Q What are the outcomes
Ω =
EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed
bull QWhat are the outcomes
Ω =
EX A 6-sided die is rolled and whether the top face is even or odd is observed
bull QWhat are the outcomes
Ω =
L1-3
If the outcome of an experiment is random how can we apply quantitative techniques to it
This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed
in the book in Section 12
1 Probability as Intuition
2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
3 Probability as a Measure of Frequency of Occurrence
4 Probability Based on Axiomatic Theory
Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
bull Some experiments are said to be ldquofairrdquo
ndash For a fair coin or a fair die toss the outcomes are equally likely
ndash Equally likely outcomes are common in many situations
bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis
bull Given an experiment with equally likely outcomes we can determine the probabilities easily
bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes
ndash An outcome or set of outcomes with probability 1 (one) is certain to occur
ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur
ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points
bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes
EX Tossing a fair coin
Let pH = Prob heads pT = Prob tails
Then pH + pT = 1(the probability that something occurs is 1)
Since pH = pT pH = pT = 12
EX Rolling a fair 6-sided die
L1-4
Let pi = Prob top face is iThen
6sumi=1
pi = 1
rArr 6p1 = 1
rArr p1 =1
6
So pi = 16
for i = 1 2 3 4 5 6
bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes
EX Rolling a fair 6-sided die
What is Prob 1 or 2
The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes
We can use equally-likely outcomes on repeated trials but we have to be careful
EX Rolling a fair 6-sided dice twice
Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die
L1-5
What are some problems with defining probability in this way
1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena
2 Can only deal with a finite number of outcomesndash this again limits its usefulness
Probability as a Measure of Frequency of Occurrence
bull Consider a random experiment that has K possible outcomes K lt infin
bull Let Nk(n) = the number of times the outcome is k
bull Then we can tabulate Nk(n) for various values of k and n
bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo
DEFN
The relative frequency of outcome k of a random experiment is
fk(n) =Nk(n)
n
bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue
DEFN
An experiment possesses statistical regularity if
limnrarrinfin
fk(n) = pk (a constant)forallk
DEFN
For experiments with statistical regularity as defined above pk is called theprobability of outcome k
PROPERTIES OF RELATIVE FREQUENCYbull Note that
0 le Nk(n) le n forallk
because Nk(n) is just the of times outcome k occurs in n trials
Dividing by n yields
0 le Nk(n)
n= fk(n) le 1 forallk = 1 K (1)
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L1-2
What is a random experiment
QWhat do we mean by random
Output is unpredictable in some sense
DEFN
A random experiment is an experiment in which the outcome varies inan unpredictable fashion when the experiment is repeated under the sameconditions
A random experiment is specified by
1 an experimental procedure2 a set of outcomes and measurement restrictions
DEFN
An outcome of a random experiment is a result that cannot be decomposed intoother results
DEFN
The set of all possible outcomes for a random experiment is called the samplespace and is denoted by Ω
EX
Tossing a coinA coin (heads on one side tails on the other) is tossed one time and the sidethat is face up is recorded
bull Q What are the outcomes
Ω =
EXRolling a 6-sided dieA 6-sided die is rolled and the number on the top face is observed
bull QWhat are the outcomes
Ω =
EX A 6-sided die is rolled and whether the top face is even or odd is observed
bull QWhat are the outcomes
Ω =
L1-3
If the outcome of an experiment is random how can we apply quantitative techniques to it
This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed
in the book in Section 12
1 Probability as Intuition
2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
3 Probability as a Measure of Frequency of Occurrence
4 Probability Based on Axiomatic Theory
Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
bull Some experiments are said to be ldquofairrdquo
ndash For a fair coin or a fair die toss the outcomes are equally likely
ndash Equally likely outcomes are common in many situations
bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis
bull Given an experiment with equally likely outcomes we can determine the probabilities easily
bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes
ndash An outcome or set of outcomes with probability 1 (one) is certain to occur
ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur
ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points
bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes
EX Tossing a fair coin
Let pH = Prob heads pT = Prob tails
Then pH + pT = 1(the probability that something occurs is 1)
Since pH = pT pH = pT = 12
EX Rolling a fair 6-sided die
L1-4
Let pi = Prob top face is iThen
6sumi=1
pi = 1
rArr 6p1 = 1
rArr p1 =1
6
So pi = 16
for i = 1 2 3 4 5 6
bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes
EX Rolling a fair 6-sided die
What is Prob 1 or 2
The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes
We can use equally-likely outcomes on repeated trials but we have to be careful
EX Rolling a fair 6-sided dice twice
Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die
L1-5
What are some problems with defining probability in this way
1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena
2 Can only deal with a finite number of outcomesndash this again limits its usefulness
Probability as a Measure of Frequency of Occurrence
bull Consider a random experiment that has K possible outcomes K lt infin
bull Let Nk(n) = the number of times the outcome is k
bull Then we can tabulate Nk(n) for various values of k and n
bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo
DEFN
The relative frequency of outcome k of a random experiment is
fk(n) =Nk(n)
n
bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue
DEFN
An experiment possesses statistical regularity if
limnrarrinfin
fk(n) = pk (a constant)forallk
DEFN
For experiments with statistical regularity as defined above pk is called theprobability of outcome k
PROPERTIES OF RELATIVE FREQUENCYbull Note that
0 le Nk(n) le n forallk
because Nk(n) is just the of times outcome k occurs in n trials
Dividing by n yields
0 le Nk(n)
n= fk(n) le 1 forallk = 1 K (1)
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L1-3
If the outcome of an experiment is random how can we apply quantitative techniques to it
This is the theory of probabilityPeople have tried to develop probability through a variety of approaches which are discussed
in the book in Section 12
1 Probability as Intuition
2 Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
3 Probability as a Measure of Frequency of Occurrence
4 Probability Based on Axiomatic Theory
Probability as the Ratio of Favorable to Unfavorable Outcomes (Classical Theory)
bull Some experiments are said to be ldquofairrdquo
ndash For a fair coin or a fair die toss the outcomes are equally likely
ndash Equally likely outcomes are common in many situations
bull Problems involving a finite number of equally likely outcomes can be solved through themathematics of counting which is called combinatorial analysis
bull Given an experiment with equally likely outcomes we can determine the probabilities easily
bull First we need a little more knowledge about probabilities of an experiment with a finitenumber of outcomes
ndash An outcome or set of outcomes with probability 1 (one) is certain to occur
ndash An outcome or set of outcomes with probability 0 (zero) is certain not to occur
ndash If you find a probability for a question on one of on my tests and your answer is lt 0 orgt 1 then you are certain to lose a lot of points
bull Using the first two rules we can find the exact probabilities of individual outcomes forexperiments with a finite number of equally likely outcomes
EX Tossing a fair coin
Let pH = Prob heads pT = Prob tails
Then pH + pT = 1(the probability that something occurs is 1)
Since pH = pT pH = pT = 12
EX Rolling a fair 6-sided die
L1-4
Let pi = Prob top face is iThen
6sumi=1
pi = 1
rArr 6p1 = 1
rArr p1 =1
6
So pi = 16
for i = 1 2 3 4 5 6
bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes
EX Rolling a fair 6-sided die
What is Prob 1 or 2
The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes
We can use equally-likely outcomes on repeated trials but we have to be careful
EX Rolling a fair 6-sided dice twice
Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die
L1-5
What are some problems with defining probability in this way
1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena
2 Can only deal with a finite number of outcomesndash this again limits its usefulness
Probability as a Measure of Frequency of Occurrence
bull Consider a random experiment that has K possible outcomes K lt infin
bull Let Nk(n) = the number of times the outcome is k
bull Then we can tabulate Nk(n) for various values of k and n
bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo
DEFN
The relative frequency of outcome k of a random experiment is
fk(n) =Nk(n)
n
bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue
DEFN
An experiment possesses statistical regularity if
limnrarrinfin
fk(n) = pk (a constant)forallk
DEFN
For experiments with statistical regularity as defined above pk is called theprobability of outcome k
PROPERTIES OF RELATIVE FREQUENCYbull Note that
0 le Nk(n) le n forallk
because Nk(n) is just the of times outcome k occurs in n trials
Dividing by n yields
0 le Nk(n)
n= fk(n) le 1 forallk = 1 K (1)
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L1-4
Let pi = Prob top face is iThen
6sumi=1
pi = 1
rArr 6p1 = 1
rArr p1 =1
6
So pi = 16
for i = 1 2 3 4 5 6
bull We can use the probabilities of the outcomes to find probabilities that involve multipleoutcomes
EX Rolling a fair 6-sided die
What is Prob 1 or 2
The basic principle of countingIf two experiments are to be performed where experiment 1 has m outcomes andexperiment 2 has n outcomes for each outcome of experiment 1 then the combinedexperiment has mn outcomes
We can use equally-likely outcomes on repeated trials but we have to be careful
EX Rolling a fair 6-sided dice twice
Suppose we roll the die two times What is the prob that we observe a 1 or 2 on either roll ofthe die
L1-5
What are some problems with defining probability in this way
1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena
2 Can only deal with a finite number of outcomesndash this again limits its usefulness
Probability as a Measure of Frequency of Occurrence
bull Consider a random experiment that has K possible outcomes K lt infin
bull Let Nk(n) = the number of times the outcome is k
bull Then we can tabulate Nk(n) for various values of k and n
bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo
DEFN
The relative frequency of outcome k of a random experiment is
fk(n) =Nk(n)
n
bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue
DEFN
An experiment possesses statistical regularity if
limnrarrinfin
fk(n) = pk (a constant)forallk
DEFN
For experiments with statistical regularity as defined above pk is called theprobability of outcome k
PROPERTIES OF RELATIVE FREQUENCYbull Note that
0 le Nk(n) le n forallk
because Nk(n) is just the of times outcome k occurs in n trials
Dividing by n yields
0 le Nk(n)
n= fk(n) le 1 forallk = 1 K (1)
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L1-5
What are some problems with defining probability in this way
1 Requires equally likely outcomes ndash thus it only applies to a small set of random phenomena
2 Can only deal with a finite number of outcomesndash this again limits its usefulness
Probability as a Measure of Frequency of Occurrence
bull Consider a random experiment that has K possible outcomes K lt infin
bull Let Nk(n) = the number of times the outcome is k
bull Then we can tabulate Nk(n) for various values of k and n
bull We can reduce the dependence on n by dividing N(k) by n to find out ldquohow often did koccurrdquo
DEFN
The relative frequency of outcome k of a random experiment is
fk(n) =Nk(n)
n
bull Observation In our previous experiments as n gets large fk(n) converges to some constantvalue
DEFN
An experiment possesses statistical regularity if
limnrarrinfin
fk(n) = pk (a constant)forallk
DEFN
For experiments with statistical regularity as defined above pk is called theprobability of outcome k
PROPERTIES OF RELATIVE FREQUENCYbull Note that
0 le Nk(n) le n forallk
because Nk(n) is just the of times outcome k occurs in n trials
Dividing by n yields
0 le Nk(n)
n= fk(n) le 1 forallk = 1 K (1)
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L1-6
bull If 1 2 K are all of the possible outcomes thenKsum
k=1
Nk(n) = n
Again dividing by n yieldsKsum
k=1
fk(n) = 1 (2)
Consider the event E that an even number occurs
What can we say about the number of times E is observed in n trialsNE(n) = N2(n) + N4(n) + N6(n)
What have we assumed in developing this equation
Then dividing by n
fE(n) =NE(n)
n=
N2(n) + N4(n) + N6(n)
n= f2(n) + f4(n) + f6(n)
bull General propertyIf A and B are 2 events that cannot occur simultaneously and C is theevent that either A or B occurs then
fC(n) = fA(n) + fB(n) (3)
bull In some sense we can define the probability of an event as its long-term relative frequency
What are some problems with this definition
1 It is not clear when and in what sense the limit exists2 It is not possible to perform an experiment an infinite number of times so the probabilities
can never be known exactly3 We cannot use this definition if the experiment cannot be repeated
We need a mathematical model of probability that is not based on a particularapplication or interpretationHowever any such model should
1 be useful for solving real problems
2 agree with our interpretation of probability as relative frequency
3 agree with our intuition (where appropriate)
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2a-1
EEL 5544 Noise in Linear Systems Lecture 2a
SET OPERATIONS
Motivation
We are interested in the probability of events which are sets of outcomes Therefore it isnecessary to be able to understand and be able to apply all the ldquotoolsrdquo (operators) and conceptsof sets
Read the following material and Sections 14 and 15 in the textbook Then watch the onlineLecture 2a in preparation for Lecture 2
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2problem 12)
EX
An elementary school is offering 3 language classes one in Spanish one inFrench and one in German These classes are open to any of the 100 studentsin the school There are 28 students in the Spanish class 26 in the French classand 16 in the German class There are 12 students that are in both Spanish andFrench 4 that are in both Spanish and German and 6 that are in both Frenchand German In addition there are 2 students taking all 3 classes
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
To solve the above problem let S F and G denote the events that a student is taking SpanishFrench and German respectively Start by expressing the desired composite event in terms ofunions intersections and complements of S F and G Then evaluate the number of items inthe set The probability of the compound event is the number of items in the set divided by thetotal population (100)
Hint It may help to draw a Venn diagram
bull Want to determine probabilities of events (combinations of outcomes)
bull Each event is a set rArr need operations on sets
1 A sub B (read ldquoA is a subset of Brdquo)
DEFN
The subset operator sub is defined for two sets A and B by
A sub B if x isin A rArr x isin B
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2a-2
Note that A = B is included in A sub B
A = B if and only if A sub B and B sub A(useful for proofs)
2 A cupB (read ldquoA union Brdquo or ldquoA or Brdquo)
DEFN
The union of A and B is a set defined by
A cupB = x |x isin A or x isin B
3 A capB (read ldquoA intersect Brdquo or ldquoA and Brdquo)
DEFN
The intersection of A and B is defined by
A capB = AB = x |x isin A and x isin B
4 Ac or A (read ldquoA complementrdquo)
DEFN
The complement of a set A in a sample space Ω is defined by
Ac = x |x isin Ω and x isin A
bull Use Venn diagrams to convince yourself of
1 (A capB) sub A and (A capB) sub B
2(A)
= A
3 A sub B rArr B sub A
4 A cup A = Ω
5 A capB = A cupB (DeMorganrsquos Law 1)
6 A cupB = A capB (DeMorganrsquos Law 2)
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2a-3
bull One of the most important relations for sets is when sets are mutually exclusive
DEFN
Two sets A and B are said to be mutually exclusive or disjoint if
This is very important so I will emphasize it again mutually exclusive isa
SAMPLE SPACES
bull Given the following experiment descriptions write a mathematical expression for the samplespace Compare and contrast the experiments and sample spaces
1 EX Roll a fair 6-sided die and note the number on the top face
2 EXRoll a fair 6-sided die and determine whether the number on the topface is even or odd
3 EX Toss a coin 3 times and note the sequence of outcomes
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2a-4
4 EX Toss a coin 3 times and note the number of heads
5 EX Roll a fair die 3 times and note the number of even results
6 EX Roll a fair 6-sided die 3 times and note the number of sixes
7 EX Simultaneously toss a quarter and a dime and note the outcomes
8 EX Simultaneously toss two identical quarters and note the outcomes
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2a-5
9 EX Pick a number at random between zero and one inclusive
10 EX Measure the lifetime of a computer chip
MORE ON EVENTS
bull Recall that an event A is a subset of the sample space Ω
bull The set of all events is the denoted F or A
EX Flipping a coin
The possible events are
ndash Heads occurs
ndash Tails occurs
ndash Heads or Tails occurs
ndash Neither Heads nor Tails occurs
bull Thus the event class can be written as
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2a-6
EX Rolling a 6-sided die
There are many possible events such as
1 The number 3 occurs
2 An even number occurs
3 No number occurs
4 Any number occurs
bull EX Express the events listed in the example above in set notation
1
2
3
4
EX The 3 language problem
1 If a student is chosen randomly what is the probability that he or she is not in any of theseclasses
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2a-7
2 If a student is chosen randomly what is the probability that he or she is taking exactly onelanguage class
MORE ON EQUALLY-LIKELY OUTCOMES
IN DISCRETE SAMPLE SPACES
EX Consider the following examples
A A coin is tossed 3 times and the sequence of H and T is notedΩ3 = HHH HHT HTH TTTAssuming equally likely outcomes
P (ai) =1
|Ω3|=
1
8for any outcome ai
Let A2 = exactly 2 heads occurs in 3 tossesThen
Note that for equally likely outcomes in general if an event E consists of K outcomes (ieE = o1 o2 oK and |E| = K) then
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2a-8
B Suppose a coin is tossed 3 times but only the number of heads is notedΩ = 0 1 2 3Assuming equally likely outcomes
P (ai) =1
|Ω|=
1
4for any outcome ai
Then
EXTENSION OF EQUALLY LIKELY OUTCOMES
TO CONTINUOUS SAMPLE SPACES
bull Cannot directly apply our notion of equally-likely outcomes to continuous sample spacesbecause any continuous sample space has an infinite number of outcomes
bull Thus if each outcome has some positive probability of occurring p gt 0 the total probabilitywould be infinp = infin
bull Our previous work on equally likely outcomes indicates that the probability of each outcomeshould be 1Ω = 0 but this seems to suggest that the probability of every event is 0
bull In fact assigning 0 probability to the outcomes does NOT necessarily imply that the probabilityof every event is 0
bull Letrsquos use our previous approach of using cardinality to define the probability of an eventConsider the continuous sets which are simplest to measure cardinality on the intervals
DEFN
For a sample space Ω that is an interval Ω = [A B] a probability measure Phas equally likely outcomes on Ω if for any a and b such that A le a le b le B
P ([a b]) =
bull Note that P (c) = |[cc]||Ω| = 0 for any c isin [A B]
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2a-9
bull For a typical continuous sample space the prob of any particular outcome is zero
Interpretation Suppose we choose random numbers in [A B] until we see all the outcomesin that range at least once
How many trials will it take infin
bull Seeing a number a finite number of times in an infinite number of trialsrArr Relative freq = 0
(Does not mean that outcome does not ever occur only that it occurs very infrequently)
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2-1
EEL 5544 Lecture 2Motivation
Read Sections 15 in the textbook
The Axioms of Probability and their Corollaries are important tools that simplify solvingprobability problems Anyone working with probability should learn these by heart and knowwhen to apply them
Try to solve the following problem (a modified version of S Ross A First Course in Probability7th ed Ch 2 prob 11) using the Corollaries to the Axioms of Probability where appropriate
EXA total of 28 percent of American males smoke cigarettes 7 percent smokecigars and 5 percent smoke both cigars and cigarettes
1 What percentage of males smoke
2 What percentage of males smoke neither cigars nor cigarettes
3 What percentage smoke cigars but not cigarettes
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2-2
PROBABILITY SPACES
We define a probability space as a mathematical construction containingthree elementsWe say that a probability space is a triple (ΩF P )
bull The elements of a probability space for a random experiment are
1
DEFN
The sample space denoted by Ω is the of
The sample space is also known as the universal set or reference set
Sample spaces come in two basic varieties discrete or continuous
DEFN
A discrete set is either or
DEFN
A set is countably infinite if it can be put into one-to-one correspondence withthe integers
EX
Examples
DEFN
A continuous set is not countable
DEFN
If a and b gt a are in an interval I then if a le x le b x isin I
bull Intervals can be either open closed or half-open
ndash A closed interval [a b] contains the endpoints a and bndash An open interval (a b) does not contain the endpoints a and bndash An interval can be half-open such as (a b] which does not contain a or [a b)
which does not contain b
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2-3
bull Intervals can also be either finite infinite or partially infinite
EX
Example of continuous sets
bull We wish to ask questions about the probabilities of not only the outcomes but alsocombinations of the outcomes For instance if we roll a six-sided die and record thenumber on the top face we may still ask questions like
ndash What is the probability that the result is evenndash What is the probability that the result is le 2
DEFNEvents are to which we assign
Examples based on previous examples of sample spaces
(a) EX
Roll a fair 6-sided die and note the number on the top faceLet L4 = the event that the result is less than or equal to 4Express L as a set of outcomes
(b) EX
Roll a fair 6-sided die and determine whether the number on the top face iseven or oddLet E = even outcome O = odd outcomeList all events
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2-4
(c) EX
Toss a coin 3 times and note the sequence of outcomesLet H=heads T=tailsLet A1= event that heads occurs on first tossExpress A1 as a set of outcomes
(d) EX
Toss a coin 3 times and note the number of headsLet O = odd number of heads occursExpress O as a set of outcomes
2
DEFN
F is the event class which is ato which we assign probability
bull If Ω is discrete then F can be taken to be the of Ω (the set of everysubset of Ω)
bull If Ω is continuous then weird things can happen if we consider every subset of Ω Forinstance if Ω itself has measure (probability) 1 we can construct a new set Ωprime thatconsists only of subsets of Ω that has measure (probability) 2
bull The solution is to not assign a measure (ie probability) to some subsets of Ω andtherefore these things cannot be events
DEFN
A collection of subsets of Ω forms a field M under the binaryoperations cup and cap if(a) empty isin M and Ω isinM(b) If EisinM and F isinM then E cup F isinM and E cap F isinM(c) If E isinM then Ec isinM
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2-5
DEFN
A field M forms a σ-algebra or σ-field if for any E1 E2 isinMinfin⋃i=1
Ei isinM (5)
bull Note that by property (c) of fields () can be interchanged withinfin⋂i=1
Ei isinM (6)
bull We require that the event class F be a σ-algebra
bull For discrete sets the set of all subset of Ω is a σ-algebra
bull For continuous sets there may be more than one possible σ-algebra possible
bull In this class we only concern ourselves with the Borel field On the real line (R) theBorel field consists of all unions and intersections of intervals of R
3
DEFN
The probability measure denoted by P is a that maps allmembers of onto
Axioms of Probability
bull We specify a minimal set of rules that P must obey
I
II
III
Corollaries
bull Let A isin F and B isin F The the following properties of P can be derived from theaxioms and the mathematical structure of F
(a) P (Ac) = 1minus P (A)
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L2-6
(b) P (A) le 1
(c) P (empty) = 0
(d) If A1 A2 An are pairwise mutually exclusive then
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Ak)
Proof is by induction(e) P (A cupB) = P (A) + P (B)minus P (A capB)
Proof and example on board(f)
P
(n⋃
k=1
Ak
)=
nsumk=1
P (Aj)minussumjltk
P (Aj cap Ak) + middot middot middot
+(minus1)(n+1)P (A1 cap A2 cap middot middot middot cap An)
Add all single events subtract off all intersections of pairs of events add in allintersections of 3 events Proof is by induction
(g) If A sub B then P (A) le P (B)Proof on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L3-1
EEL 5544 Noise in Linear Systems Lecture 3
Motivation
An important concept in probability is independence What do you think of when you hear theword ldquoindependencerdquo
In probability events or experiments can be independent Consider two events A and B Thenone implication of A and B being independent is that knowledge of whether A occurs does notinfluence the probability that B occurs and vice versa
However the definition of independent events is not explicitly based on this interpretationRead Section 16 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 73) The answers are shown at the bottom of the next page so that you can check your work
EX
1 Suppose that each child born to a couple is equally likely to be either a boyor a girl independent of the sex distribution of the other children in the familyFor a couple having 5 children compute the probabilities of the followingevents
(a) All children are of the same sex
(b) The 3 eldest are boys and the others are girls
(c) The 2 oldest are girls
(d) There is at least one girl
If events are not independent then knowledge that one event occurred can influence theprobability that another event occurred In fact this is the key to many communication andsignal processing problems we wish to detect or estimate a signal based on the observation of anoisy version of the signal The signal to be estimated can often be treated as random itself andthe observation is another random quantity
We define the conditional probability of an event A given that event B occurred as
P (A|B) =P (A capB)
P (B) for P (B) gt 0
What if A and B are independent
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L3-2
Motivation-continued
Try to answer the following problem (S Ross A First Course in Probability 7th ed Ch 3prob 60) The answers are at the bottom of the page
EX
2 The color of a personrsquos eyes is determined by a single pair of genesIf they are both blue-eyed genes then the person will have blue eyes ifeither is a brown-eyed gene then the person will have brown eyes (Thebrown-eyed gene is said to be dominant over the blue-eyed one) A newbornchild independently receives one eye-color gene from each of its parentsand the gene it receives from a parent is equally likely to be either of thetwo eye-color genes that the parent carries Suppose that Smith and both ofhis parent have brown eyes but Smithrsquos sister has blue eyes Furthermoresuppose that Smithrsquos wife has blue eyes
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their nextchild will also have brown eyes
1
STATISTICAL INDEPENDENCE
DEFN
Two events A isin A and B isin A are abbreviated if and only if
bull Later we will show that if events are statistically independent then knowledge that one eventoccurs does not affect the probability that the other event occurs
bull Events that arise from completely separate random phenomena are statistically independent
It is important to note that statistical independence is a
What type of relation is mutually exclusive
Claim If A and B are si events then the following pairs of events are also si
ndash A and B
ndash A and B
ndash A and B
11 (a) 116 (b) 132 (c) 14 (d) 3132 2 (a) 23 (b) 13 (c) 34
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L3-3
Proof It is sufficient to consider the first case ie show that if A and B are si events thenA and B are also si eventsThe other cases follow directly
Note that A = (A capB) cup (A capB) and A = (A capB) cap (A capB) = emptyThus
P (A) = P [(A capB) cup (A capB)]
= P (A capB) + P (A capB)
= P (A)P (B) + P (A capB)
Thus
P (A capB) = P (A)minus P (A)P (B)
= P (A)[1minus P (B)]
= P (A)P (B)
bull For N events to be si require
P (Ai cap Ak) = P (Ai)P (Ak)
P (Ai cap Ak cap Am) = P (Ai)P (Ak)P (Am)
P (A1 cap A1 cap middot middot middot cap AN) = P (A1)P (A2) middot middot middotP (AN)
(It is not sufficient to have P (Ai cap Ak) = P (Ai)P (Ak)foralli 6= k)
Weaker Condition Pairwise StatisticalIndependence
DEFN
N events A1 A2 AN isin A are pairwise statistically independent if andonly if P (Ai cap Aj) = P (Ai)P (Aj)foralli 6= j
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L3-4
Examples of Applying Statistical Independence
EXFor a couple having 5 children compute the probabilities of the followingevents
1 All children are of the same sex
2 The 3 eldest are boys and the others are girls
3 The 2 oldest are girls
4 There is at least one girl
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L3-5
RELATION BETWEEN STATISTICALLY INDEPENDENT AND
MUTUALLY EXCLUSIVE EVENTS
bull One common mistake of students learning probability is to confuse statistical independencewith mutual exclusiveness
bull Remember
ndash Mutual exclusiveness is a
ndash Statistical independence is a
bull How does mutual exclusiveness relate to statistical independence
1 me rArr si
2 si rArr me
3 two events cannot be both si and me except in some degenerate cases
4 none of the above
bull Perhaps surprisingly is true
Consider the following
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L3-6
Conditional Probability Consider an example
EX EX Defective computers in a lab
A computer lab contains
bull two computer from manufacturer A one of which is defectivebull three computers from manufacturer B two of which are defective
A user sits down at a computer at random Let the properties of the computer he sits down atbe denoted by a two letter code where the first letter is the manufacturer and the second letter is Dfor a defective computer and N for a non-defective computer
Ω = AD AN BD BD BN
Let
bull EA be the event that the selected computer is from manufacturer A
bull EB be the event that the selected computer is from manufacturer B
bull ED be the event that the selected computer is defective
Find
P (EA) = P (EB) = P (ED) =
bull Now suppose that I select a computer and tell you its manufacturer Does that influence theprobability that the computer is defective
bull Ex Suppose I tell you the computer is from manufacturer A Then what is the prob that itis defective
We denote this prob as P (ED|EA)(means the conditional probability of event ED given that event EA occured)
Ω = AD AN BD BD BN
Find
ndash P (ED|EB) =
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L3-7
ndash P (EA|ED) =
ndash P (EB|ED) =
We need a systematic way of determining probabilities given additional information about theexperiment outcome
CONDITIONAL PROBABILTY
Consider the Venn diagram
A
B
S
If we condition on B having occurred then we can form the new Venn diagram
B
A B
This diagram suggests that if A capB = empty then if B occurs A could not have occurred
Similarly if B sub A then if B occurs the diagram suggests that A must have occurred
A definition of conditional probability that agrees with these and other observations is
DEFN
For A isin A B isin A the conditional probability of event A given that event Boccurred is
P (A|B) =P (A capB)
P (B) for P (B) gt 0
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L3-8
Claim If P (B) gt 0 the conditional probability P (|B) forms a valid probability space onthe original sample space(SA P (middot|B))
Check the axioms
1
P (S|B) =P (S capB)
P (B)=
P (B)
P (B)= 1
2 Given A isin A
P (A|B) =P (A capB)
P (B)
and P (A capB) ge 0 P (B) ge 0
rArr P (A|B) ge 0
3 If A sub A C sub A and A cap C = empty
P (A cup C|B) =P [(A cup C) capB]
P [B]
=P [(A capB) cup (C capB)]
P [B]
Note that A cap C = empty rArr (A capB) cap (C capB) = empty so
P (A cup C|B) =P [A capB]
P [B]+
P [C capB]
P [B]
= P (A|B) + P (C|B)
EX Check prev example Ω =
AD AN BD BD BN
P (ED|EA) =
P (ED|EB) =
P (EA|ED) =
P (EB|ED) =
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L3-9
EX The genetics (eye color) problem
(a) What is the probability that Smith possesses a blue-eyed gene
(b) What is the probability that the Smithsrsquo first child will have blue eyes
(c) If their first child has brown eyes what is the probability that their next child will also havebrown eyes
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L4-1
EEL 5544 Noise in Linear Systems Lecture 4
Motivation
Conditional probability is one of the main tools that we use to makesolving probability problems easier Conditional probability is sopowerful because it allows us to decompose a problem into moremanageable pieces It also allows us to look at a problem fromdifferent points of view (see the examples in Section 13)
Read Section 17 in the textbook
Try to use the tools of conditional probability to answer the following question (S Ross A FirstCourse in Probability 7th ed Ch 3 prob 37) The answers appear at the bottom of the page
EX
(a) A gambler has in his pocket a fair coin and a two-headed coin Heselects one of the coins at random when he flips it it shows headsWhat is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it showsheads Now what is the probability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Nowwhat is the probability that it is the fair coin
2
Ex Drawing two consecutive balls without replacement
An urn containsbull two black balls numbered 1 and 2bull three white balls numbered 3 4 and 5
Ω = B1 B2 W3 W4 W5
Two balls are drawn from the urn without replacement What is the probability of getting awhite ball on the second draw given that you got a white ball on the first draw
Let Wi= event that white ball chosen on draw i
What is P (W2|W1)
2 (a) 13 (b) 15 (c) 1
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L4-2
STATISTICAL INDEPENDENCE AND CONDITIONAL PROBABILITY
bull In the previous lecture I claimed that if two events are statistically independent then knowledgeof whether one event occurred would not affect the probability of another event occurring
bull Letrsquos evaluate that using conditional probability
bull If P (B) gt 0 and A and B are si then
(When A and B are si knowledge of B does not change the prob of A (and vice versa))
Relating Conditional and Unconditional Probs
Which of the following statements are true
(a) P (A|B) ge P (A)
(b) P (A|B) le P (A)
(c) Not necessarily (a) or (b)
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L4-3
Chain rule for expanding intersections
Note that P (A|B) =P (A capB)
P (B)
rArr P (A capB) = P (A|B)P (B) (7)
and P (B|A) =P (A capB)
P (A)
rArr P (A capB) = P (B|A)P (A) (8)
bull Eqns () and () are chain rules for expanding the probability of the intersection of twoevents
bull The chain rule can be easily generalized to more than two events
Ex Intersection of 3 events
P (A capB cap C) =P (A capB cap C)
P (B cap C)middot P (B cap C)
P (C)middot P (C)
= P (A|B cap C)P (B|C)P (C)
Partitions and Total Probability
DEFN
A collection of sets A1 A2 partitions the sample space Ω if and only if
Ai is also said to be a partition of Ω
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L4-4
1 Venn Diagram
2 Expressing an arbitrary set using a partition of Ω
Law of Total Probability
If Ai partitions Ω then
Example
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L4-5
EX Binary Communication System
A transmitter (Tx) sends A0 A1
bull A0 is sent with prob P (A0) = 25
bull A1 is sent with prob P (A1) = 35
bull A receiver (Rx) processes the output of the channel into one of two values B0 B1
bull The channel is a noisy channel that determines the probabilities of transitions between A0 A1
and B0 B1
ndash The transitions are conditional probabilities P (Bj|Ai) that can be specified on a channeltransition diagram
56A
B
B
0
1
0A
1
78
16
18
bull The receiver must use a decision rule to determine from the output B0 or B1 whether the thesymbol that was sent was most likely A0 or A1
bull Letrsquos start by considering 2 possible decision rules
1 Always decide A1
2 When receive B0 decide A0 when receive B1 decide A1
Problem Determine the probability of error for these two decision rules
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L4-6
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L4-7
EX Optimal Detection for Binary Communication System
Design an optimal receiver to minimize error probability
bull Let Hij be the hypothesis (ie we decide) that Ai was transmitted when Bj is received
bull The error probability of our system is then the probability that we decide A0 when A1 istransmitted plus the probability that we decided A1 when A0 is transmitted
P (E) = P (H10 cap A0) + P (H11 cap A0) + P (H01 cap A1) + P (H00 cap A1)
bull Note that at this point our decision rule may be randomized Ie it may result in rules ofthe form When B0 is received decide A0 with probability η and decide A1 with probability1minus η
bull Since we only apply hypothesis Hij when Bj is received we can write
P (E) = P (H10 cap A0 capB0) + P (H11 cap A0 capB1)
+P (H01 cap A1 capB1) + P (H00 cap A1 capB0)
bull We apply the chain rule to write this as
P (E) = P (H10|A0 capB0)P (A0 capB0) + P (H11|A0 capB1)P (A0 capB1)
+P (H01|A1 capB1)P (capA1 capB1) + P (H00|A1 capB0)P (A1 capB0)
bull The receiver can only decide Hij based on Bj it has no direct knowledge of Ak SoP (Hij|Ak capBj) = P (Hij|Bj) for all i j k
bull Furthermore let P (H00|B0) = q0 and P (H11|B1) = q1
bull Then
P (E) = (1minus q0)P (A0 capB0) + q1P (A0 capB1)
(1minus q1)P (A1 capB1) + q0P (A1 capB0)
bull The choices of q0 and q1 can be made independently so P (E) is minimized by finding q0
and q1 that minimize
(1minus q0)P (A0 capB0) + q0P (A1 capB0) andq1P (A0 capB1) + (1minus q1)P (A1 capB1)
bull We can further expand these using the chain rule as
(1minus q0)P (A0|B0)P (B0) + q0P (A1|B0)P (B0) andq1P (A0|B1)P (B1) + (1minus q1)P (A1|B1)P (B1)
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L4-8
bull P (B0) and P (B1) are constants that do not depend on our decision rule so finally we wishto find q0 and q1 to minimize
(1minus q0)P (A0|B0) + q0P (A1|B0) and (9)q1P (A0|B1) + (1minus q1)P (A1|B1) (10)
bull Both of these are linear functions of qo and q1 so the minimizing values must be at theendpointsIe either qi = 0 or qi = 1
rArr Our decision rule is not randomized
bull By inspection () is minimized by setting q0 = 1 if P (A0|B0) gt P (A1|B0) and settingq0 = 0 otherwise
bull Similarly () is minimized by setting q1 = 1 if P (A1|B1) gt P (A0|B1 and setting q = 0otherwise
bull These rules can be summarized as follows
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Ai|Bj)
bull In words Choose the signal that was most probably transmitted given what you receivedand the a priori probabilities of what was transmitted
bull The probabilities P (Ai|Bj) are called a posteriori probabilities meaning that they are theprobabilities after the output of the system is measured
bull This decision rule we have just derived is the maximum a posteriori (MAP) decision rule
bull Problem we donrsquot know P (Ai|Bj)
bull Need to express P (Ai|Bj) in terms of P (Ai) P (Bj|Ai)
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L4-9
bull General technique
If Ai is a partition of Ω and we know P (Bj|Ai) and P (Ai) then
where the last step follows from the Law of Total Probability
The expression in the box is
EX
Calculate the a posteriori probabilies and determine the MAP decision rulesfor the binary communication system with P (A0) = 25 p01 = 18 andp10 = 16
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L4-10
bull Often we may not know the a posteriori probabilities P (A0) and P (A1)
bull In this case we typically treat the input symbols as equally likely P (A0) = P (A1) = 05
bull Under this assumption P (A0|B0) = cP (B0|A0) and P (A1|B0) = cP (A1|B0) so thedecision rule becomes
Given that Bj is received decide Ai was transmitted if Ai maximizes P (Bj|Ai)
bull This is known as the maximum likelihood (ML) decision rule
EX The gambler with the two coins
(a) A gambler has in his pocket a fair coin and a two-headed coin He selects one of the coins atrandom when he flips it it shows heads What is the probability that it is the fair coin
(b) Suppose that he flips the same coin a second time and again it shows heads Now what is theprobability that it is the fair coin
(c) Suppose that he flips the same coin a third time and it shows tails Now what is theprobability that it is the fair coin
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5a-1
EEL 5544 Noise in Linear Systems Lecture 5a
Motivation
In the Motivation section for the first class I mentioned that many probability problems can besolved by counting the number of favorable outcomes and the number of total outcomes Themathematics of counting is called combinatorics and it can be surprisingly tricky
Read Section 18 in the textbook
Try to solve the following problem (S Ross A First Course in Probability 7th ed Ch 2 prob16) The answers appear at the bottom of the page
EXPoker dice is played by simultaneously rolling five dice Find the followingprobabilities
(a) P [no two alike]
(b) P [one pair]
(c) P [two pair]
(d) P [three alike]
(e) P [full house]
(f) P [four alike]
(g) P [five alike]
Note that a full house is three of one number plus a pair of another number3
COMBINATORICS
bull Basic principle of counting The number of distinct ordered k-tuples (x1 x2 xk) thatare possible if xi is an element of a set with ni distinct members is
SPECIAL CASES
1 Sampling with repacement and with orderingProblem Choose k objects from a set A with replacement (after each choice theobject is returned to the set) The ordered series of results is noted
3(a) 00926 (b) 04630 (c) 02315 (d) 01543 (e) 00386 (f) 00193 (g) 00008
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5a-2
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Thus n1 = n2 = = nk = |A| equiv n
rArr number of distinct ordered k-tuples is
2 Sampling wout replacement amp with orderingProblem Choose k objects from a set A without replacement (after each object isselected it is removed from the set) Let n = |A| where k le n
Result is k-tuple (x1 x2 xk)where xi isin Aforalli = 1 2 k
Then n1 = n n2 = nminus 1 nk = nminus (k minus 1)
rArr number of distinct ordered k-tuples is
3 Permutations of n distinct objectsProblem Find the of permutations of a set of n objects (The of permutations isthe of possible orderings)
Result is n-tuple
The of orderings is the of ways we can take the objects one at a time from a setwithout replacement until no objects are left in the set
Then n1 = n n2 = nminus 1 nnminus1 = 2 nn = 1
rArr of permutations = of distinct ordered n-tuples
=
=
Useful formula For large nn asymp
radic2πnn+ 1
2 eminusn
MATLAB TECHNIQUE To determine n in MATLAB use gamma(n+1)
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5a-3
4 Sample wout Replacement amp wout OrderQuestion How many ways are there to pick k objects from a set of n objects withoutreplacement and without regard to order
Equivalently ldquoHow many subsets of size k (called a combination of size k) are therefrom a set of size nrdquo
First determine of ordered subsets
Now how many different orders are there for any set of k objects
Let Cnk = the number of combinations of size k from a set of n objects
Cnk =
ordered subsets orderings
=
DEFN
The number of ways in which a set of size n can be partitioned into J subsetsof size k1 k2 kJ is given by the multinomial coefficient
bull The multinomial coefficient assumes that the subsets are ordered If the subsets are notordered then divide by the number of permutations of subsets
bull EX Suppose there are 72 students in the class and I randomly divide them into groups of3 and assign each group a different project How many different arrangements of groups arethere
Using the multinomial coefficient there are
72
(3)24 asymp 13times 1085
Order matters because each group gets a different project
bull EX Now suppose there are 72 students that work in groups of 3 where each group workson the same project
There are 24 ways to arrange the 24 groups so the total number of ways to assign people togroups is
72
(3)24 (24)asymp 21times 1061 ordered groups
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5a-4
bull EX What is the probability that each number appears exactly twice in 12 rolls of6-sided die
One possible way 1 1 2 2 3 3 4 4 5 5 6 6
ndash How many total ways can this occur
ndash What is the prob of this occurring
5 Sample with Replacement and wout OrderingQuestion How many ways are there to choose k objects from a set of n objects withoutregard to order
bull The total number of arrangements is(k + nminus 1
k
)
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5-1
EEL 5544 Noise in Linear Systems Lecture 5
Motivation
In many probability problems an experiment is repeated and a probability is evaluated on thecombined outcome The repeated experiment can be such that the individual experiments areindependent of each other or they can influence each other In either case these are calledrepeated trials
Read Sections 19ndash112 in the textbook
Try to answer the following question The answer appears at the bottom of the page
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
4
EX By Request The Game ShowMonty Hall Problem
Slightly paraphrased from the Ask Marilyn column of Parade magazine
bull Suppose yoursquore on a game show and yoursquore given the choice of three doors
ndash Behind one door is a car
ndash Behind the other doors are goats
bull You pick a door and the host who knows whatrsquos behind the doors opens another doorwhich he knows has a goat
bull The host then offers you the option to switch doors Does it matter if you switch
bull Compute the probabilities of winning for the following three strategies
1 Never switch
400794
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5-2
2 Always switch
3 Flip a fair coin and switch if it comes up heads
SEQUENTIAL EXPERIMENTS
DEFN
A sequential experiment (or combined experiment) consists of a series ofsubexperiments
Prob Space for N Sequential Experiments
1 The combined sample spaceΩ = theof the sample spaces for the subexperiments
bull Then any A isin Ω is of the form (B1 B2 BN) where Bi isin Ωi
2 Event class F σ-algebra on Ω
3 P (middot) on F such that
P (A) =
For si subexperiments
P (A) =
Bernoulli Trials
DEFN
A Bernoulli trial is an experiment that is performed once with outcomes ofsuccess or failure
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5-3
bull Let p denote the probability of success
bull Then for n independent Bernoulli trials the prob of a specific arrangement of k successes(and nminus k failures) is
bull The number of ways that k success can occur in n places is
bull Thus the probability of k successes on n trials is
pn(k) = (11)
Examples
EX Toss a fair coin 3 times What is the probability of exactly 3 heads
EX
Suppose a data packet is encoded into a 100-bit packet that is sent over achannel with bit error probability 10minus2 The packet will not decode properlyif 3 or more bit errors occur What is the probability that the packet does notdecode correctly
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5-4
bull When n is large the probability in () can be difficult to compute as(
nk
)gets very large at
the same time that one or both of the other terms gets very small
For large n we can approximate the binomial probabilities using either
bull Poisson Law applies when n becomes large while np remains constant SeeSection 110 We will generally not use this as an approximation but rather tostudy phenomena that are equally likely to occur at any time or any place
bull Gaussian approximation applies for large n and k See Section 111 and below
bull Let
b(k n p) =
(n
k
)pk(1minus p)nminusk (12)
bull Recall b(k n p) gives the probabilities for the number of successes that we see on nBernoulli trials
bull An equivalent formulation is the following
ndash Consider sequence of variables Xi
ndash For each Bernoulli trial i let Xi = 1
ndash Let Sn =sumn
i=1 Xi
Then b(k n p) is the probability that Sn = k
bull Under some very general conditions the Laws of Large Numbers (to be discussed later)indicate that a sum of random variables converges to a certain type of random variableknow as the Gaussian or Normal random variable
bull The Gaussian random variable is characterized by a density function
φ(x) =1radic2π
exp
[minusx2
2
](13)
and distribution function
Φ(x) =1radic2π
int x
minusinfinexp
[minust2
2
]dt (14)
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5-5
bull Engineers often prefer an alternative to () which is called the complementary distributionfunction given by
Q(x) = 1minus Φ(x) (15)
=1radic2π
int infin
x
exp
[minust2
2
]dt (16)
bull In this class I strongly prefer the use of Q(x) and will use that in preference to Φ(x)
bull By applying the Law of Large Numbers it can be shown that for large n
b(k n p) asymp 1radic
npqφ
(k minus npradic
npq
)(17)
bull Note that () uses the density function φ not the distribution function Φ
bull This formula is most useful when it is extended to dealing with sums of binomialprobabilities
bull For fixed integers α and β
P [α le Sn le β] asymp Φ
[β minus np + 05
radicnpq
]minus Φ
[αminus npminus 05
radicnpq
]Q
[αminus npminus 05
radicnpq
]minusQ
[β minus np + 05
radicnpq
]
bull Note that the book defines an error functions erf(x) I will not use this function because thedefinition in the book is different than the standard definition which just causes confusion
bull I will give you a Q-function table to use with each exam
GEOMETRIC PROBABILITY LAWSuppose we repeat independent Bernoulli trials until the first successWhat is the prob that m trials are required
p(m) = P (m trials to first success)= P (1st mminus 1 trials are failures cap
mth trial is success)
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5-6
Then
p(m) =
Also P ( trials gt k) =
EX
A communication system uses Automatic Repeat Request for reliable datadelivery over a wireless channel If the probability of packet error is 01what is the probability that exactly 2 transmissions are required more than2 transmissions
THE POISSON LAW
Example A mobile base station receives an average of 30 calls per hour during a certain time ofdayConstruct a probabilistic model for this
bull How should we proceed What should we assume
bull Letrsquos start with a binomial model
bull Attempt 1 If we assume that only one call can come in during any 1-minute period then wecan model the call arrivals by Bernoulli experiments with probability of success p = 05
bull Then an average of 30 calls come in per hour with a maximum of 60 callshour
bull However the limit on only one call per 1-minute period seems artificial as does the limit ofa maximum of 60 callshour
bull Attempt 2 Letrsquos assume that only one call can come in during any 30 second period Thenwe can model the call arrivals by Bernoulli experiments with probability of success p = 025
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5-7
bull Again we have an average of 30 callshour However now calls can come in at 30-secondintervals There are now 120 experiments so the maximum possible number of callshour is120
bull Attempt n Letrsquos assume that calls come in during 1nth of a minute For there to be anaverage of 30 callshour then p = 1(2n) so np = 05 is a constant Then the maximumpossible number of callshour is n
bull As n rarrinfin the number of subintervals goes to infinity at the same rate as the probability ofan arrival in any particular interval goes to zero
bull For finite k the probability of k arrivals is a Binomial probability(n
k
)pk(1minus p)nminusk
bull If we let np = α be a constant then for large n(n
k
)pk(1minus p)nminusk asymp 1
kαk(1minus α
n
)nminusk
bull Then
limnrarrinfin
1
kαk(1minus α
n
)nminusk
=αk
keminusα
which is the Poisson law
bull The Poisson law gives the probability for events that occur randomly in space or time
bull Examples are
ndash calls coming in to a switching center
ndash packets arriving at a queue in a network
ndash processes being submitted to a schedulerThe following examples are adopted from A First Course in Probability by Sheldon Ross
ndash of misprints on a group of pages in a book
ndash of people in a community that live to be 100 years old
ndash of wrong telephone numbers that are dialed in a day
ndash of α-particles discharged in a fixed period of time from some radioactive material
ndash of earthquakes per year
ndash of computer crashes in a lab in a week
The Poisson RV can be used to model all of these phenomena
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L5-8
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
EX Phone calls arriving at a switching officeIf a switching center takes 90 callshr what is the prob that it receives 20 calls in a 10 minuteinterval
bull The Gaussian approximation can also be applied to Poisson probabilities ndash see pages 46 and47 in the book for details
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L6-1
EEL 5544 Lecture 6
Motivation
Many phenomena that engineers are interested can be quantified ie expressed as a numericalquantity Some examples are signals or numbers of occurrences in some time period Randomvariables can be used to represent random numerical quantities Random variables can be builtupon axiomatic probability theory and form the basis for random processes which are randomphenomena that can vary over space or time
Read Sections 21ndash23 in the textbook
Try to answer the following question The answers are shown so that you can check your work
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z (Ans ΩZ = z|0 le z le 2b)
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
Ans
FZ(z) =
0 z le 0z2
2b2 0 lt z le b
b2minus (2bminusz)2
2
b2 b lt z lt 2b
1 z ge 2b
4 Specify the type (discrete continuous mixed) of Y (continuous)
RANDOM VARIABLES (RVS)
bull What is a random variable
bull We define a random variable is defined on a probability space (ΩF P ) as afrom to
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L6-2
EX
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L6-3
bull Recall that we cannot define probabilities for all possible subsets of a continuous samplespace Ω
bull Thus in defining a random variable X as a function on Ω we must ensure that any region ofX for which we wish to assign probability must map to an event in the event class F
bull We will only assign probabilities to Borel sets of the real line
DEFN
A real random variable X(ω) defined on a probability space (ΩF P ) is areal-valued function on Ω that satisfies the following(i) For every Borel set of real numbers B isin B the set EB
∆= ξ isin Ω X(ξ) isin
B is an event and(ii) P [X = minusinfin] = 0 and P [X = +infin] = 0
bull The Borel field onR contains all sets that can be formed from countable unions intersectionsand complements of sets of the form x|x isin (minusinfin x]
bull Thus it is convenient to define a function that assigns probabilities to sets of this form
DEFN
If (ΩF P ) is a prob space with X(ω) a real RV on Ω the
( ) denoted is
bull FX(x) is a prob measure
bull Properties of FX
1 0 le FX(x) le 1
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L6-4
2 FX(minusinfin) = 0 and FX(infin) = 1
3 FX(x) is monotonically nondecreasingie FX(a) le FX(b) iff a le b
4 P (a lt X le b) = FX(b)minus FX(a)
5 FX(x) is continuous on the rightie FX(b) = limhrarr0 FX(b + h) = FX(b)
(The value at a jump discontinuity is the value after the jump)
Pf omitted
bull If FX(x) is continuous function of x thenF (x) = F (xminus)
bull If FX(x) is not a continuous function of x then from above
FX(x)minus FX(xminus) = P [xminus lt X le x]
= limεrarr0
P [xminus ε lt X le x]
= P [X = x]
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L6-5
EX Examples
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L6-6
EX
A dart is thrown into a square with corners at (0 0) (0 b) (b 0) and (b b)Assume that the dart is equally likely to fall anywhere in the square Let therandom variable Z be given by the sum of the two coordinates of the pointwhere the dart lands
1 Describe the sample space of Z
2 Find the region in the square corresponding to the event Z le z for minusinfin lt z lt infin
3 Find and plot FZ(z)
4 Specify the type (discrete continuous mixed) of Y (continuous)
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-1
EEL 5544 Lecture 7
Motivation
Continuous random variables can be characterized through either their distribution or densityfunctions (when the density exists)
Discrete random variables can be characterized through their distribution and probability massfunctions
The greatest benefits from working with the density function come when we start to workwith expected values of random variables functions of random variables and multiple randomvariables
Read Sections 23ndash25 in the textbook
MATLAB makes an ideal platform for experimenting with random variables (although itsrandom number generator is deficient for some types of simulation)
EX Try the following in MATLAB
Make one million uniformly distributed random variables (donrsquot forget the semicolon at the endor yoursquoll be waiting a long time as it displays all one million values)u=rand(11000000)
Check that the density is uniform by plotting a histogramhist(u100)
Transform those random variables into another type of random variable through a function(again donrsquot forget the semicolon)v=-log(u)
Check the density by plotting a histogramhist(u100)
Compare the histogram to the densities of common random variables shown in the book and thenotes What type of random variable was created
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-2
Now letrsquos do one more transformationw=sqrt(v)
Again compare the histogram to the densities of common random variables shown in the bookand the notes What type of random variable was created
This exercise demonstrates the use of MATLAB to generate random variables estimate theirdensities through the histogram function and how random variables can be transformed throughfunctions
PROBABILITY DENSITY FUNCTION
DEFN
The ( ) fX(x) of arandom variable X is the derivative (which may not exist at some places) of
bull Properties
1 fX(x) ge 0minusinfin lt x lt infin
2 FX(x) =int x
minusinfin fX(t)dt
3int +infinminusinfin fX(x)dx = 1
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-3
4 P (a lt X le b) =int b
afX(x)dx a le b
5 If g(x) is a nonnegative piecewise continuous function with finite integralint +infin
minusinfing(x)dx = cminusinfin lt c lt +infin
then fX(x) = g(x)c
is a valid pdf
Pf omitted
bull Note that if f(x) exists at x then FX(x) is continuous at x and thus P [X = x] = F (x) minusF (xminus) = 0
bull Recall that this does not mean that x never occurs but that the occurrence is extremelyunlikely
DEFN
If FX(x) is continuous for every x and its derivative existseverywhere except at a countable set of points then X is a
bull Where FX(x) is continuous but F primeX(x) is discontinuous any positive number can be assigned
to fX(x) and thus fX(x) will be assigned to every point of fX(x) when X is a continuousrandom variable
Uniform Continuous Random Variables
DEFN
For a uniform RV on an interval [a b] any two subintervals of [a b] that havethe same length will have equal probabilities The density function is given by
fX(x) =
0 x lt a
1bminusa
a le x le b
0 x gt b
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-4
DEFN
A has probability concentrated ata countable number of valuesIt has a staircase type of distribution function
PROBABILITY MASS FUNCTION
DEFN
For a discrete RV the probability mass function (pmf) is
EX Roll a fair 6-sided dieX= on top face
P (X = x) =
16 x = 1 2 6
0 ow
EX Flip a fair coin until heads occursX = of flips
P (X = x) =
(12
)x x = 1 2
0 ow
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-5
IMPORTANT RANDOM VARIABLES
Discrete RVs
1 Bernoulli RV
bull An event A isin A is considered a ldquosuccessrdquo
bull A Bernoulli RV X is defined by
X =
1 s isin A
0 s isin A
bull The pmf for a Bernoulli RV X can be found formally as
P (X = 1) = P
(X(s) = 1
)= P
(s|s isin A
)= P (A) p
So
P (X = x) =
p x = 1
1minus p x = 0
0 x 6= 0 1
2 Binomial RV
bull A Binomial RV represents the number of sucesses on n independent Bernoulli trials
bull Thus a Binomial RV can also be defined as the sum of n independent Bernoullis RVs
bull Let X= of successesThen the pmf of X is given by
P [X = k] =
(nk
)pk(1minus p)nminusk k = 0 1 n
0 ow
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-6
bull The pmfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035Binomial PMF (10 02)
P(X
=x)
x0 1 2 3 4 5 6 7 8 9 10
0
005
01
015
02
025
x
P(X
=x)
Binomial (1005) pmf
bull The cdfs for two Binomial RVs with n = 10 and p = 02 or p = 05 are shown below
0 1 2 3 4 5 6 7 8 9 1001
02
03
04
05
06
07
08
09
1
11Binomial (1002) CDF
x
F X(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Binomial (1005) CDF
x
F X(x
)
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-7
bull When n is large the Binomial pmf has a bell shape For example the pmf below is forn = 100 and p = 02
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009
01Binomial(10002)
x
P(X
=x)
3 Geometric RV
bull A Geometric RV occurs when independent Bernoulli trials are conducted until the firstsuccess
bull X = number of trials requiredThen the pmf or X is given by
P [X = k] =
(1minus p)kminus1p k = 1 2 n
0 ow
bull The pmfs for two Geometric RVs p = 01 or p = 07 are shown below
1 2 3 4 5 6 7 8 9 10 110
001
002
003
004
005
006
007
008
009
01Geometric(01) pmf
x
P(X
=x)
1 2 3 4 5 6 7 8 9 10 110
01
02
03
04
05
06
07Geometric(07) pmf
x
P(X
=x)
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-8
bull The cdfs for two Geometric RVs with p = 01 or p = 07 are shown below
0 5 10 15 20 25 30 35 40 4501
02
03
04
05
06
07
08
09
1Geometric (01) pmf
x
FX(x
)
1 2 3 4 5 6 7 8 9 10 11065
07
075
08
085
09
095
1Geometric(07) CDF
x
FX(x
)
4 Poisson RV
bull Models events that occur randomly in space or time
bull Let λ= the of events(unit of space or time)
bull Consider observing some period of time or space of length t and let α = λt
bull Let N= the events in time (or space) t
bull Then
P (N = k) =
αk
keminusα k = 0 1
0 ow
bull The pmfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025
03
035
04
045
05Poisson (075) pmf
x
P(Y
=y)
0 1 2 3 4 5 6 7 8 9 100
005
01
015
02
025Poisson(3) pmf
x
FX(x
)
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-9
bull The cdfs for two Poission RVs with α = 075 or α = 3 are shown below
0 1 2 3 4 5 6 7 8 9 1004
05
06
07
08
09
1Poission(075) CDF
x
FX(x
)
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Poisson(3) CDF
x
FX(x
)
bull For large α the Poisson pmf has a bell shape For example the pmf below if for aPoisson RV with α = 20
0 10 20 30 40 50 60 70 80 90 1000
001
002
003
004
005
006
007
008
009Poisson(20) pmf
x
P(X
=x)
IMPORTANT CONTINUOUS RVS
1 Uniform RVcovered in example
2 Exponential RV
bull Characterized by single parameter λ
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-10
bull Density function
fX(x) =
0 x lt 0
λeminusλx x ge 0
bull Distribution function
FX(x) =
0 x lt 0
1minus eminusλx x ge 0
bull The bookrsquos definition substitutes λ = 1micro
bull Obtainable as limit of geometric RV
bull The pdf and CDF for exponential random variables with λ equal to 025 1 or 4 areshown below
0 1 2 3 4 5 6 7 8 9 100
05
1
15
2
25
3
35
4Exponential pdfs
x
f X(x
)
λ=025λ=1
λ=4
0 1 2 3 4 5 6 7 8 9 100
01
02
03
04
05
06
07
08
09
1Exponential CDFs
x
FX(x
)
α=025
α=1
α=4
3 Gaussian RV
bull For large n the binomial and Poisson RVs have a bell shape
bull Consider the binomial
bull DeMoivre-Laplace Theorem If n is large and p is small then if q = 1minus p(n
k
)pkqnminusk asymp 1radic
2πnpqexp
minus(k minus np)2
2npq
bull Let σ2 = npq micro = np then
P ( k successes on n Bernoulli trials)
=
(n
k
)pkqnminusk
asymp 1radic2πσ2
exp
minus(k minus micro)2
2σ2
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-11
DEFN
A Gaussian random variable X has density function
fX(x) =1radic
2πσ2exp
minus(xminus micro)2
2σ2
with parameters (mean) micro and (variance) σ2 ge 0
bull In fact the sum of a large number of almost any type of independent random variablesconverges (in distribution) to a Gaussian random variable (Central Limit Theorem)
bull The CDF of a Gaussian RV is given by
FX(x) = P (X le x)
=
int x
minusinfin
1radic2πσ2
exp
minus(tminus micro)2
2σ2
dt
which cannot be evaluated in closed formbull The pdf and CDFs for Gaussian random variables with (micro = 15 σ2 = 025) (micro =
5 σ2 = 1) and (micro = 10 σ2 = 25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08Gaussian pdfs
x
f X(x
)
(micro=15σ2=025)
(micro=5σ2=1)
(micro=10σ2=25)
0 2 4 6 8 10 12 14 16 18 200
01
02
03
04
05
06
07
08
09
1Gaussian cdfs
x
FX(x
)
(micro=15σ=025)
(micro=5σ2=1)
(micro=10σ2=25)
bull Instead we tabulate distribution functions for a normalized Gaussian variable withmicro = 0 and σ2 = 1
ndash The CDF for the normalized Gaussian RV is defined as
Φ(x) =
int x
minusinfin
1radic2π
exp
minust2
2
dt
ndash Engineers more commonly use the complementary distribution function or Q-functiondefined by
Q(x) =
int infin
x
1radic2π
exp
minust2
2
dt
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L7-12
ndash Note that Q(x) = 1minus Φ(x)
ndash I will be supplying you with a Q-function table and a list of approximations toQ(x)
ndash The Q-function can also be defined as
Q(x) =1
π
int π2
0
exp
minus x2
sin2 φ
dφ
(This is a fairly recent result that is in very few textbooks This form has a finiterange of integration that is often easier to work with)
bull The CDF for a Gaussian RV with mean micro and variance σ2 is
FX(x) = Φ
(xminus micro
σ
)= 1minusQ
(xminus micro
σ
)
Note that the denominator above is σ not σ2 Many students use the wrong valuewhen working tests
bull To find the prob of some interval using the Q-function it is easiest to rewrite the prob
P (a lt X le b) = P (X gt a)minus P (X gt b)
= Q
(aminus micro
σ
)minusQ
(bminus micro
σ
)
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L8-1
EEL 5544 Lectures 8
Motivation
Gaussian random variables are one of the most important random variables that engineersencounter Many accumulated phenomena such as thermal noise can be modeled as Gaussiannoise
In many problems in which noise is encountered the observed random variable consists of signalplus noise where the signal is also a random variable that must be estimated In this and manyother scenarios the tools of conditional probability make evaluating probabilities more tractable
Read Sections 26 through p 85 in the textbook
Try to answer the following question Answers appear at the bottom of the page
EXLet X be a Gaussian random variable with mean micro and variance σ2 Find thefollowing probabilities
(a) P [X lt micro]
(b) P [|X minus micro| gt kσ] for k = 1 2 3 4 5
5
Examples with Gaussian Random Variables
EX The Motivation problem
5(a) 12 (b) 0317 455times 10minus2 270times 10minus3 634times 10minus5 and 575times 10minus7 respectively
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L8-2
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L8-3
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L8-4
OTHER IMPORTANT CONTINUOUS RANDOM VARIABLES
4 Laplacian RV
bull The Laplacian random variable is often used to model the difference betweencorrelated data sources
bull For example the differences between adjacent samples of speech images or videoare often modeled using Laplacian random variables
bull The density function for a Laplacian random variable is
fX(x) =c
2exp minusc|x| minusinfin lt x lt infin
where c gt 0
bull The pdf for a Laplacian RV with c = 2 is shown below
minus3 minus2 minus1 0 1 2 30
01
02
03
04
05
06
07
08
09
1Laplacian pdf
x
f X(x
)
5 Chi-square RV
bull Let X be a Gaussian random variable
bull Then Y = X2 is a chi-square random variable
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L8-5
bull But in fact chi-square random variables are much more general For instance if Xi i =1 2 n are si Gaussian random variables with identical variances σ2 then
Y =nsum
i=1
X2i (18)
is a chi-square random variable with n degrees of freedom
bull The chi-square random variable is usually classified as either central or non-central
bull If in () all of the Gaussian random variables have mean micro = 0 then Y is a centralchi-square random variable
bull The density of a central chi-square random variable is given by
fY (y) =
1
σn2n2Γ(n2)y(n2)minus1eminusy(2σ2) y ge 0
0 y lt 0
where Γ(p) is the gamma function defined asΓ(p) =
intinfin0
tpminus1eminustdt p gt 0
Γ(p) = (pminus 1) p an integer gt 0
Γ(12) =radic
π
Γ(32) = 12
radicπ
bull Note that for n = 2
fY (y) =
1
2σ2 eminusy(2σ2) y ge 0
0 y lt 0
which is an exponential density
bull Thus the sum of the squares of two si zero-mean Gaussian RVs with equal variance isexponential
bull Note also that although the Gaussian random variables that make up the central chi-squareRV have zero mean the chi-square RV has mean gt 0
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L8-6
bull The pdf for chi-square RVs with 12 4 or 8 degrees of freedom and σ2 = 1 are shownbelow
0 2 4 6 8 10 12 140
01
02
03
04
05
06
07
08
09
1chiminussquare density functions with n degrees of freedom σ2=1
x
f X(x
)n=1
n=2n=4
n=8
bull The CDF for a central chi-square RV can be found through repeated integration by parts tobe
FY (y) =
1minus eminusy(2σ2)
summminus1k=0
1k
(y
2σ2
)k y ge 0
0 y lt 0
bull The non-central chi-square random variable has more complicated pdfs and CDFs and theCDF cannot be written in closed form
6 Rayleigh RV
bull Consider an exponential RV Y which is equivalent to a central chi-square RV with2 degrees of freedom which is equivalent to the sum of the squares of independentzero-mean Gaussian RVs with common variance
bull Let R =radic
Y
bull Then R is a Rayleigh RV with pdf
fR(r) =
rσ2 e
minusr2(2σ2) r ge 0
0 r lt 0
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L8-7
bull The pdf for Rayleigh RV with σ2 = 1 is shown below
0 05 1 15 2 25 3 35 4 45 50
01
02
03
04
05
06
07Rayleigh PDF (σ2=1)
r
f R(r
)
bull The CDF for a Rayleigh RV is
FR(r) =
1minus eminusr2(2σ2) r ge 0
0 r lt 0
bull The amplitude of a radio signal in a dense multipath environment is often modeled as acomplex Gaussian random variable with independent real and imaginary parts Thus theamplitude of the waveform is a Rayleigh random variable
6 Lognormal RV
bull Consider a random variable L such that X = ln L is a Gaussian RV
bull For instance in communications we find that shadowing of communication signalsby buildings and foliage has a Gaussian distribution when the signal loss is expressedin decibels
bull Then L is a lognormal random variable with pdf given by
fL(`) =
1
`radic
2πσ2exp
minus (ln `minusmicro)2
2σ2
` ge 0
0 ` lt 0
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L8-8
CONDITIONAL DISTRIBUTIONS AND DENSITIES
bull Given (ΩF P ) and X Ω rarr R ( a RV on Ω)
DEFN
For an event A isin F with P (A) 6= 0 the conditional distribution of X given Ais
fX(x|A) =
DEFN
For an event A isin F with P (A) 6= 0 the conditional density of X given A is
fX(x|A) =
bull Then FX(x|A) is a distribution function and fX(x|A) is a density function
EX
Suppose that the waiting time in minutes at Wendyrsquos in the Reitz Union isan exponential random variable with parameter λ = 1 Given that you havealready been waiting for two minutes then what is
1 the conditional distribution for your total wait time
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L8-9
2 the conditional distribution for your remaining wait time
TOTAL PROBABILITY AND BAYESrsquo THEOREM
bull If A1 A2 An form a partition of Ω then
FX(x) = FX(x|A1)P (A1) + FX(x|A2)P (A2)
+ + FX(x|An)P (An)
and
fX(x) =nsum
i=1
fX(x|Ai)P (Ai)
Point Conditioning
bull Suppose we want to evaluate the probability of an event given that X = x where X is acontinuous random variable
bull Clearly P (X =x) = 0 so the previous definition of conditional prob will not work
P (A|X = x) = lim∆xrarr0
P (A|x lt X le x + ∆x)
= lim∆xrarr0
FX(x + ∆x|A)minus FX(x|A)
FX(x + ∆x)minus FX(x)P (A)
= lim∆xrarr0
FX(x+∆x|A)minusFX(x|A)∆x
FX(x+∆x)minusFX(x)∆x
P (A)
=fX(x|A)
fX(x)P (A)
if fX(x|A) and fX(x) exist
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board
L8-10
Implication of Point Conditioning
bull Note that
P (A|X = x) =fX(x|A)
fX(x)P (A)
rArr P (A|X = x)fX(x) = fX(x|A)P (A)
rArrint infin
minusinfinP (A|X = x)fX(x)dx =
int infin
minusinfinfX(x|A)dxP (A)
rArr P (A) =
int infin
minusinfinP (A|X = x)fX(x)dx
(Continuous Version of Law of Total Probability)
bull Once we have the Law of Total Probability it is easy to derive the
Continuous Version of Bayersquos Theorem
fX(x|A) =P (A|X = x)
P (A)fX(x)
=P (A|X = x)fX(x)intinfin
minusinfin P (A|X = t)fX(t)dt
Examples on board