Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 1 of 34 ECE 3800
Charles Boncelet, “Probability, Statistics, and Random Signals," Oxford University Press, 2016. ISBN: 978-0-19-020051-0
Chapter 5: MULTIPLE DISCRETE RANDOM VARIABLES
Sections 5.1 Multiple Random Variables and PMFs 5.2 Independence 5.3 Moments and Expected Values
5.3.1 Expected Values for Two Random Variables 5.3.2 Moments for Two Random Variables
5.4 Example: Two Discrete Random Variables 5.4.1 Marginal PMFs and Expected Values 5.4.2 Independence 5.4.3 Joint CDF 5.4.4 Transformations With One Output 5.4.5 Transformations With Several Outputs 5.4.6 Discussion
5.5 Sums of Independent Random Variables 5.6 Sample Probabilities, Mean, and Variance 5.7 Histograms 5.8 Entropy and Data Compression
5.8.1 Entropy and Information Theory 5.8.2 Variable Length Coding 5.8.3 Encoding Binary Sequences 5.8.4 Maximum Entropy
Summary Problems
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 2 of 34 ECE 3800
Multiple Discrete Random Variables
The joint probability mass function
𝑝 𝑘, 𝑙 𝑃𝑟 𝑋 𝑥 ∩ 𝑌 𝑦
Properties of the joint pmf
1. 𝑝 𝑘, 𝑙 𝑃𝑟 𝑋 𝑥 ∩ 𝑌 𝑦 0 (all probabilities are positive)
2. The summation of the pmf for all k,l is equal to 1.
𝑝 𝑘, 𝑙 1.0
The marginal pmf of the individual random variables can be computed
𝑝 𝑘, 𝑙 𝑝 𝑙
𝑝 𝑘, 𝑙 𝑝 𝑘
Example 5.1 Let 𝑝 𝑘, 𝑙 𝑃𝑟 𝑋 𝑥 ∩ 𝑌 𝑦 be
y=1 0.1 0.0 0.1 0.1
y=0 0.0 0.1 0.4 0.2
x=0 x=1 x=2 x=3
The table explicitly shows each of the (2D) values
𝑝 𝑘, 𝑙 𝑃𝑟 𝑋 𝑥 ∩ 𝑌 𝑦
𝑝 𝑙 𝑝 𝑘, 𝑙
𝑝 0 𝑝 𝑘, 0 𝑝 0,0 𝑝 1,0 𝑝 2,0 𝑝 3,0
𝑝 1 𝑝 𝑘, 1 𝑝 0,1 𝑝 1,1 𝑝 2,1 𝑝 3,1
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 3 of 34 ECE 3800
𝑝 0 0.0 0.1 0.4 0.2 0.7
𝑝 1 0.1 0.0 0.1 0.1 0.3
𝑝 0 𝑝 0, 𝑙 𝑝 0,0 𝑝 0,1
𝑝 1 𝑝 1, 𝑙 𝑝 1,0 𝑝 1,1
𝑝 2 𝑝 2, 𝑙 𝑝 2,0 𝑝 2,1
𝑝 3 𝑝 3, 𝑙 𝑝 3,0 𝑝 3,1
𝑝 0 0.1 0.0 0.1
𝑝 1 0.0 0.1 0.1
𝑝 2 0.1 0.4 0.5
𝑝 3 0.1 0.2 0.3
Notice that the sum in X or Y is 1.0!
𝑝 0 𝑝 1 1.0
𝑝 0 𝑝 1 𝑝 2 𝑝 3 1.0
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 4 of 34 ECE 3800
The joint Cumulative Distribution Function exists
𝐹 𝑢, 𝑣 𝑃𝑟 𝑋 𝑢 ∩ 𝑌 𝑣
Properties
0 𝐹 𝑢, 𝑣 1,𝑓𝑜𝑟 ∞ 𝑢 ∞ 𝑎𝑛𝑑 ∞ 𝑣 ∞
𝐹 𝑢 𝑃𝑟 𝑋 𝑢 𝑃𝑟 𝑋 𝑢 ∩ 𝑌 ∞ 𝐹 𝑢,∞
𝐹 𝑢𝑣 𝑃𝑟 𝑌 𝑣 𝑃𝑟 𝑋 ∞ ∩ 𝑌 𝑣 𝐹 ∞, 𝑣
Calculating the probability of an “area”
𝑃𝑟 𝑎 𝑋 𝑏 ∩ 𝑐 𝑌 𝑑 𝐹 𝑏,𝑑 𝐹 𝑏, 𝑐 𝐹 𝑎,𝑑 𝐹 𝑎, 𝑐
A 2D diagram may help
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 5 of 34 ECE 3800
Conditional Probability
Let 𝐴 𝑋 𝑥 and 𝐵 𝑌 𝑦
𝑃𝑟 𝑋 𝑥 |𝑌 𝑦 𝑃𝑟 𝐴|𝐵𝑃𝑟 𝐴,𝐵𝑃𝑟 𝐵
𝑃𝑟 𝑋 𝑥 |𝑌 𝑦 𝑃𝑟 𝐴|𝐵𝑃𝑟 𝑋 𝑥 ∩ 𝑌 𝑦
𝑃𝑟 𝑌 𝑦
From the previous example x=0 x=1 x=2 x=3
y=1 0.1 0.0 0.1 0.1 y=0 0.0 0.1 0.4 0.2
𝑃𝑟 𝑌 0|𝑋 0𝑃𝑟 𝑋 0 ∩ 𝑌 0
𝑃𝑟 𝑋 00.00.1
𝑃𝑟 𝑌 1|𝑋 0𝑃𝑟 𝑋 0 ∩ 𝑌 1
𝑃𝑟 𝑋 00.10.1
1.0
𝑃𝑟 𝑌 1|𝑋 2𝑃𝑟 𝑋 2 ∩ 𝑌 1
𝑃𝑟 𝑋 20.10.5
0.2
𝑃𝑟 𝑋 2|𝑌 1𝑃𝑟 𝑋 2 ∩ 𝑌 1
𝑃𝑟 𝑌 10.10.3
0.333⋯
If we consider three random variables: 𝐴 𝑋 𝑥 , 𝐵 𝑌 𝑦 , and 𝐶 𝑍 𝑧
We can consider the following (functions of 3 random variables)
𝑃𝑟 𝑋 𝑥 |𝑌 𝑦 ∩ 𝑍 𝑧 𝑃𝑟 𝐴|𝐵𝐶𝑃𝑟 𝐴𝐵𝐶𝑃𝑟 𝐵𝐶
𝑝 𝑘, 𝑙,𝑚𝑝 𝑙,𝑚
or
𝑃𝑟 𝑋 𝑥 ∩ 𝑌 𝑦 |𝑍 𝑧 𝑃𝑟 𝐴𝐵|𝐶𝑃𝑟 𝐴𝐵𝐶𝑃𝑟 𝐶
𝑝 𝑘, 𝑙,𝑚𝑝 𝑚
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 6 of 34 ECE 3800
Independence
If X and Y are independent
𝑝 𝑘, 𝑙 𝑃𝑟 𝑋 𝑥 ∩ 𝑌 𝑦 𝑃𝑟 𝑋 𝑥 ∙ 𝑃𝑟 𝑌 𝑦 𝑝 𝑘 ∙ 𝑝 𝑙
and
𝐹 𝑢, 𝑣 𝑃𝑟 𝑋 𝑢 ∩ 𝑌 𝑣 𝑃𝑟 𝑋 𝑢 ∙ 𝑃𝑟 𝑌 𝑣 𝐹 𝑢 ∙ 𝐹 𝑣
For three or more random variables,
1) Each pair is independent
2) The joint pmf of all three factors for all outcomes is independent
Useful terminology and concept: Independent and Identically Distributed (IID)
For this case, the R.V. are independent and all have the same pmf!
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 7 of 34 ECE 3800
Moments and Expected Values
𝐸 𝑔 𝑋,𝑌 𝑔 𝑥 , 𝑦 ∙ 𝑝 𝑘, 𝑙
Property Additive
𝐸 𝑔 𝑋,𝑌 𝑔 𝑋,𝑌 𝑔 𝑥 ,𝑦 𝑔 𝑥 ,𝑦 ∙ 𝑝 𝑘, 𝑙
𝐸 𝑔 𝑋,𝑌 𝑔 𝑋,𝑌 𝑔 𝑥 ,𝑦 ∙ 𝑝 𝑘, 𝑙 𝑔 𝑥 , 𝑦 ∙ 𝑝 𝑘, 𝑙
𝐸 𝑔 𝑋,𝑌 𝑔 𝑋,𝑌 𝐸 𝑔 𝑋,𝑌 𝐸 𝑔 𝑋,𝑌
Correlated Random Variables
𝑟 𝐸 𝑋 ∙ 𝑌
Multiplicative if and only if X and Y independent
𝐸 𝑋 ∙ 𝑌 𝑋 ∙ 𝑌 ∙ 𝑝 𝑘, 𝑙 𝑋 ∙ 𝑌 ∙ 𝑝 𝑘 ∙ 𝑝 𝑙
𝐸 𝑋 ∙ 𝑌 𝑌 ∙ 𝑝 𝑙 ∙ 𝑋 ∙ 𝑝 𝑘 𝑌 ∙ 𝑝 𝑙 ∙ 𝐸 𝑋
𝐸 𝑋 ∙ 𝑌 𝐸 𝑋 ∙ 𝑌 ∙ 𝑝 𝑙 𝐸 𝑋 ∙ 𝐸 𝑌
And in general for independent R.V.
𝐸 𝑔 𝑋 ∙ 𝑔 𝑌 𝐸 𝑔 𝑋 ∙ 𝐸 𝑔 𝑌
Note that if X and Y are not independent, they may be said to be correlated and correlation coefficients will be defined later.
𝑟 𝐸 𝑋 ∙ 𝑌 𝐸 𝑋 ∙ 𝐸 𝑌
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 8 of 34 ECE 3800
Covariance and Correlation Coefficients
This gives rise to another term Covariance
𝜎 𝐶𝑜𝑣 𝑋,𝑌 𝐸 𝑋 𝜇 ∙ 𝑌 𝜇
also
𝐶𝑜𝑣 𝑋,𝑌 𝐸 𝑋 ∙ 𝑌 𝜇 ∙ 𝑌 𝑋 ∙ 𝜇 𝜇 ∙ 𝜇
𝐶𝑜𝑣 𝑋,𝑌 𝐸 𝑋 ∙ 𝑌 𝐸 𝜇 ∙ 𝑌 𝐸 𝑋 ∙ 𝜇 𝐸 𝜇 ∙ 𝜇
𝐶𝑜𝑣 𝑋,𝑌 𝐸 𝑋 ∙ 𝑌 𝜇 ∙ 𝜇 𝜇 ∙ 𝜇 𝜇 ∙ 𝜇
𝐶𝑜𝑣 𝑋,𝑌 𝐸 𝑋 ∙ 𝑌 𝜇 ∙ 𝜇
or
𝜎 𝐶𝑜𝑣 𝑋 ∙ 𝑌 𝑟 𝜇 ∙ 𝜇
A general derivation true for all combinations of two random variables.
Note that if X and Y are independent
𝑟 𝜇 ∙ 𝜇
and
𝜎 𝐶𝑜𝑣 𝑋 ∙ 𝑌 0
As more of Dr. Bazuin’s notation that you may see …
𝑟 𝐸 𝑋
𝐶𝑜𝑣 𝑋 ∙ 𝑋 𝐾 𝜎
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 9 of 34 ECE 3800
TheCorrelationCoefficient
Letting 𝑍 𝑋 𝑌
The mean value
𝜇 𝐸 𝑍 𝐸 𝑋 𝑌 𝐸 𝑋 𝐸 𝑌 𝜇 𝜇
The variance
𝜎 𝐸 𝑍 𝜇 𝐸 𝑋 𝜇 𝑌 𝜇
𝜎 𝐸 𝑋 𝜇 2 ∙ 𝐸 𝑋 𝜇 ∙ 𝑌 𝜇 𝐸 𝑌 𝜇
𝜎 𝜎 2 ∙ 𝐶𝑜𝑣 𝑋,𝑌 𝜎
𝜎 𝜎 2 ∙ 𝜎 𝜎
Thinking in terms of a product function, we define
𝜎 𝜎 2 ∙ 𝜌 ∙ 𝜎 ∙ 𝜎 𝜎
With the correlation coefficient defined as
𝜌𝜎
𝜎 ∙ 𝜎
or
𝜌𝐸 𝑋 ∙ 𝑌 𝜇 ∙ 𝜇
𝜎 ∙ 𝜎
As a result of the “normalized scaling” we expect
1 𝜌𝜎
𝜎 ∙ 𝜎1
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 10 of 34 ECE 3800
Aspecialnote
Independent random variables are uncorrelated.
𝐸 𝑋 ∙ 𝑌 𝑟 𝐸 𝑋 ∙ 𝐸 𝑌 𝜇 ∙ 𝜇
𝜌𝐸 𝑋 ∙ 𝑌 𝜇 ∙ 𝜇
𝜎 ∙ 𝜎
𝜌𝜇 ∙ 𝜇 𝜇 ∙ 𝜇
𝜎 ∙ 𝜎0
However, uncorrelated random variables are not necessarily independent!
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 11 of 34 ECE 3800
Example 4.3-5 from Stark and Woods: Given
YXP , 11 x 02 x 13 x
01 y 0 31 0
12 y 31 0 3
1
2
1, ,
jjiYXiX yxPxP
3
1,
2
1,
jjiYXiX yxPxP
3
1321 xPxPxP XXX
3
2,
3
1, 10
3
1,
yPyPyxPyP YYj
ijYXiY
Note, not independent iYjXijYX yPxPyxP ,,
03
11
3
10
3
11
3
1
i
iXi xPxXE
3
2
3
21
3
10
2
1
i
iYi yPyYE
03
111010
3
111001
3
100001
,2
1
3
1,
i j
ijYXij yxPyxYXE
Therefore 0, YXCOV
The covariance and correlation coefficient are zero, but the R.V. are not independent!
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 12 of 34 ECE 3800
Example 5.4
Two-dimensional probability example.
0 1 2 3 4
1
2
0
12 equally likely points in X and Y
𝑝 𝑘, 𝑙 𝑃𝑟 𝑋 𝑥 ∩ 𝑌 𝑦
Determining marginal probabilities
𝑝 𝑙 𝑝 𝑘, 𝑙
𝑝 𝑙5
12,
412
,3
12
𝐸 𝑌 𝑌 ∙ 𝑝 𝑙
𝐸 𝑌 0 ∙5
121 ∙
412
2 ∙3
124 6
121012
𝐸 𝑌 𝑌 ∙ 𝑝 𝑙
𝐸 𝑌 0 ∙5
121 ∙
412
2 ∙3
124 12
121612
𝜎 𝐸 𝑌 𝐸 𝑌
𝜎1612
1012
1612
100144
192 100144
92144
2336
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 13 of 34 ECE 3800
𝑝 𝑘 𝑝 𝑘, 𝑙
𝑝 𝑘3
12,
312
,3
12,
212
,1
12
𝐸 𝑋 𝑋 ∙ 𝑝 𝑘
𝐸 𝑋 0 ∙3
121 ∙
312
2 ∙3
123 ∙
212
4 ∙1
120 3 6 6 4
121912
𝐸 𝑋 𝑋 ∙ 𝑝 𝑘
𝐸 𝑋 0 ∙3
121 ∙
312
2 ∙3
123 ∙
212
4 ∙1
120 3 12 18 16
124912
𝜎 𝐸 𝑋 𝐸 𝑋
𝜎4912
1912
4912
361144
588 361144
227144
Computing a CDF
Determine the bounds of interest
𝐶𝐷𝐹 2.5,1.3 𝑝 𝑘, 𝑙
..6
12
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 14 of 34 ECE 3800
HW Problem 5.5 Continue the example in Section 5.4 and consider the joint transformation. Consider the joint transformation Two dimensional probability example in (X,Y)
0 1 2 3 4
1
2
0
12 equally likely points in X and Y
Letting 𝑈 𝑚𝑖𝑛 𝑋,𝑌 and 𝑊 𝑚𝑎𝑥 𝑋,𝑌
a.) What are the level curves (draw picture)?
b.) What are the individual PMFs of U and W?
c.) What is the joint PMF of U and W?
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 15 of 34 ECE 3800
5.5 Sums of Independent Random Variables
Letting 𝑍 𝑋 𝑌
The mean value
𝜇 𝐸 𝑍 𝐸 𝑋 𝑌 𝐸 𝑋 𝐸 𝑌 𝜇 𝜇
The variance
𝜎 𝐸 𝑍 𝜇 𝐸 𝑋 𝜇 𝑌 𝜇
𝜎 𝐸 𝑋 𝜇 2 ∙ 𝐸 𝑋 𝜇 ∙ 𝑌 𝜇 𝐸 𝑌 𝜇
𝜎 𝜎 2 ∙ 𝐶𝑜𝑣 𝑋,𝑌 𝜎
𝜎 𝜎 2 ∙ 0 𝜎
𝜎 𝜎 𝜎
Letting 𝑆 𝑋 𝑋 𝑋
The mean value is the sum of the means for all R.V.!
𝜇 𝐸 𝑆 𝐸 𝑋 𝑋 𝑋 𝐸 𝑋 𝐸 𝑋 𝐸 𝑋 𝜇 𝜇 𝜇
The variance is the sum of the variances for independent R.V.!
Letting 𝑆 𝑋 𝑋 𝑋 𝑍 𝑋
𝜎 𝜎 𝜎
𝜎 𝜎 𝜎 𝜎
Note that in this case it is easier to form the variance than the 2nd moment!
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 16 of 34 ECE 3800
Joint pmf of Independent Random Variables – Convolution!
Letting 𝑍 𝑋 𝑌
The pmf of Z can be defined as
𝑝 𝑛 𝑃𝑟 𝑋 𝑌 𝑛
from total probability
𝑝 𝑛 𝑃𝑟 𝑋 𝑌 𝑛|𝑌 𝑙 ∙ 𝑃𝑟 𝑌 𝑙
But this is equivalent to
𝑝 𝑛 𝑃𝑟 𝑋 𝑛 𝑙|𝑌 𝑙 ∙ 𝑃𝑟 𝑌 𝑙
With intendance, a joint probability is the product of probabilities; therefore,
𝑝 𝑛 𝑃𝑟 𝑋 𝑛 𝑙 ∙ 𝑃𝑟 𝑌 𝑙
Resulting in
𝑝 𝑛 𝑝 𝑛 𝑙 ∙ 𝑝 𝑙
This is the discrete convolution of the two independent R.V. pmf functions!
𝑝 𝑝 ∗ 𝑝
https://en.wikipedia.org/wiki/Convolution – Math that simplifies our computations!
For multiple independent R.V. sums, keep convolving!
𝑝 𝑝 ∗ 𝑝 ∗ 𝑝
Revisit flipping coins … as a discrete pmf convolution.
Revisit two fair die … as a discrete pmf convolution.
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 17 of 34 ECE 3800
Discrete Convolutions
The sum of two fair die
66
15
6
14
6
13
6
12
6
11
6
1 kkkkkkkpmf
YXZ
A discrete convolution is in the form.
𝑝 𝑧 𝑝 𝑧 𝑘 ∙ 𝑝 𝑘 , 𝑓𝑜𝑟 2 𝑧 12, 1 𝑘 𝑎𝑛𝑑 𝑧 𝑘 6
Matlabconvolution.pmf1=[1 1 1 1 1 1] pmf1 = 1 1 1 1 1 1 >> conv(pmf1,pmf1) ans = 1 2 3 4 5 6 5 4 3 2 1 or pmf1=[1 1 1 1 1 1]/6 pmf1 = 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 >> conv(pmf1,pmf1) ans = Columns 1 through 11 0.0278 0.0556 0.0833 0.1111 0.1389 0.1667 0.1389 0.1111 0.0833 0.0556 0.0278
Matlab can “multiply” polynomials when correctly constructed using the conv function!
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 18 of 34 ECE 3800
Textbook Example: Comment 5.5
The manual means of convolution shown (p. 115) is very handy for “long-hand” computations.
Matlab
>> X=[1 2 3 -4];
>> Y = [3 4 5];
>> conv(X,Y)
ans = 3 10 22 10 -1 -20
Binomial expansion, repeated convolution by [1 1] …
b2= [1 1];
>> b3 = conv(b2,b2)
b3 = 1 2 1
>> b4 = conv(b3,b2)
b4 = 1 3 3 1
>> b5 = conv(b4,b2)
b5 = 1 4 6 4 1
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 19 of 34 ECE 3800
Textbook Moment Generating Function of two ind. R.V.
From Laplace … convolution in the time domain is multiplication in the Laplace domain.
𝑍 𝑋 𝑌
𝑝 𝑧 𝑝 𝑧 𝑘 ∙ 𝑝 𝑘
𝑀 𝑠 𝐸 𝑒𝑥𝑝 𝑠 ∙ 𝑍
𝑀 𝑠 𝐸 𝑒𝑥𝑝 𝑠 ∙ 𝑋 𝑌
𝑀 𝑠 𝐸 𝑒𝑥𝑝 𝑠 ∙ 𝑋 ∙ 𝑒𝑥𝑝 𝑠 ∙ 𝑌
𝑀 𝑠 𝐸 𝑒𝑥𝑝 𝑠 ∙ 𝑋 ∙ 𝐸 𝑒𝑥𝑝 𝑠 ∙ 𝑌
𝑀 𝑠 𝑀 𝑠 ∙ 𝑀 𝑠
For the sum of independent R.V., the MGF is the product of the MGF!
General comment on Laplace Transforms
a convolution in one domain is multiplication in the other Convolve in time/sample – multiply in Laplace Multiply in time/sample – convolve in Laplace
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 20 of 34 ECE 3800
5.6 Sample Probabilities, Mean, and Variance (The beginning of the relationship between statistics and probability!)
Statistics Definition: The science of assembling, classifying, tabulating, and analyzing data or facts:
Descriptive statistics – the collecting, grouping and presenting data in a way that can be easily understood or assimilated.
Inductive statistics or statistical inference – use data to draw conclusions about or estimate parameters of the environment from which the data came from.
Theoretical Areas:
Sampling Theory – selecting samples from a collection of data that is too large to be examined completely.
Estimation Theory – concerned with making estimates or predictions based on the data that are available.
Hypothesis Testing – attempts to decide which of two or more hypotheses about the data are true.
Curve fitting and regression – attempt to find mathematical expressions that best represent the data. (Shown in Chap. 4)
Analysis of Variance – attempt to assess the significance of variations in the data and the relation of these variances to the physical situations from which the data arose. (Modern term ANOVA)
We will focus on parameter estimation of the mean and variance to begin!
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 21 of 34 ECE 3800
Sampling Theory – The Sample Mean
How many samples are required to find a representative sample set that provides confidence in the results?
Defect testing, opinion polls, infection rates, etc.
Definitions
Population: the collection of data being studied N is the size of the population
Sample: a random sample is the part of the population selected all members of the population must be equally likely to be selected! n is the size of the sample
Sample Mean: the average of the numerical values that make of the sample
Population: N
Sample set: 𝑆 ∈ 𝑥 , 𝑥 , 𝑥 ,⋯ , 𝑥
Sample Mean �̅� ∙ ∑ 𝑥
To generalize, describe the statistical properties of arbitrary random samples rather than those of any particular sample.
Sample Mean 𝑋 ∙ ∑ 𝑋 ,
where iX are random variables with a pdf.
Notice that for a pdf, the true mean, X , can be compute while for a sample data set the
above sample mean, is computed. X̂
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 22 of 34 ECE 3800
As may be noted, the sample mean is a combination of random variables and, therefore, can also be considered a random variable. As a result, the hoped for result can be derived as:
𝐸 𝑋 𝜇 𝐸1𝑁∙ 𝑋
1𝑁∙ 𝐸 𝑋
1𝑁∙ 𝑋
𝑁𝑁∙ 𝑋 𝑋 𝜇
If and when this is true, the estimate is said to be an unbiased estimate.
Though the sample mean may be unbiased, the sample mean may still not provide a good estimate.
What is the “variance” of the computation of the sample mean?
Varianceofthesamplemean–(themeanitself,notthevalueofX)
You would expect the sample mean to have some variance about the “probabilistic” or actual mean; therefore, it is also desirable to know something about the fluctuations around the mean. As a result, computation of the variance of the sample mean is desired.
For N>>n or N infinity (or even a known pdf), using the collected samples … based on the prior definition of variance, a statistical estimate of the 2nd moment and the square of the mean.
22
1
ˆ1ˆ XEXn
EXVarn
ii
211
2
1ˆ XXXn
EXVarn
jj
n
ii
21 1
2
1ˆ XXXn
EXVarn
i
n
jji
21 1
2
1ˆ XXXEn
XVarn
i
n
j
ji
For iX independent (measurements should be independent of each other)
jiforXXEXEXE
jiforXXEXXE
ji
ii
ji
,ˆ
,
22
22
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 23 of 34 ECE 3800
As a result we can define two summation where i=j and i<>j,
21 ,1
2
1ˆ XXXEXXEn
XVarn
i
n
ijj
jiii
2222
2
1ˆ XXEnnXEnn
XVar ii
22
2
221ˆ XX
n
nnX
nXVar
nn
XXX
n
nX
nXVar
2222
221ˆ
where 2 is the true variance (probabilistic) of the random variable, X.
Therefore, as n approaches infinity, this variance in the sample mean estimate goes to zero!
It is referred to as a “consistent” estimate. Thus a larger sample size leads to a better estimate of the population mean.
Note: this variance is developed based on “sampling with replacement”.
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 24 of 34 ECE 3800
Example: How many samples of an infinitely long time waveform would be required to insure the mean is within 1% of the true (probabilistic) mean value? For this relationship, we would require that
𝑉𝑎𝑟 𝑋 0.01 ⋅ 𝜇 0.01 ⋅ 𝜇
Infinite set, therefore assume that you use the “with replacement equation”:
n
XVar2
ˆ
Assume that the true means is 10 and that the true variance is 9 so that the mean +/- a standard deviation would be 310 . Then,
21001.09ˆ n
XVar
01.01.09 2 n
900n
A very large sample set size to “estimate” the mean within the 1% desired bound!
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 25 of 34 ECE 3800
Sampling Theory – The Sample Variance
When dealing with probability, both the mean and variance provide valuable information about the “DC” and “AC” operating conditions (about what value is expected) and the variance (in terms of power or squared value) about the operating point.
Therefore, we are also interested in the sample variance as compared to the true data variance.
The sample variance of the population (stdevp) is defined as:
n
i
i XXn
S
1
22 ˆ1
and continuing until (shown in the coming pages)
22 1
n
nSE
where is the true probabilistic variance of the random variable.
Note: the sample variance is not equal to the true variance; it is a biased estimate!
To create an unbiased estimator, scale by the biasing factor to compute (stdev):
n
ii
n
iix XX
nXX
nn
nSE
n
nSE
1
2
1
2222 ˆ
1
1ˆ1
11
~
This is equation 5.12 in the textbook!
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 26 of 34 ECE 3800
Additional notes: MATLAB and MS Excel
Simulation and statistical software packages allow for either biased or unbiased computations.
In MS Excel there are two distinct functions stdev and stdevp.
stdev uses (n-1) - http://office.microsoft.com/en-us/excel-help/stdev-function-HP010335660.aspx
stdevp uses (n) - https://support.office.com/en-US/article/STDEVP-function-1F7C1C88-1BEC-4422-8242-E9F7DC8BB195
In MATLAB, there is an additional flag associate with the std function.
n
jjx
nXXstd
1
2
1
1var , flag implied as 0
n
jjx
nXXstd
1
211,var1, , flag specified as 1
>> help std std Standard deviation. For vectors, Y = std(X) returns the standard deviation. For matrices, Y is a row vector containing the standard deviation of each column. For N-D arrays, std operates along the first non-singleton dimension of X. std normalizes Y by (N-1), where N is the sample size. This is the sqrt of an unbiased estimator of the variance of the population from which X is drawn, as long as X consists of independent, identically distributed samples. Y = std(X,1) normalizes by N and produces the square root of the second moment of the sample about its mean. std(X,0) is the same as std(X).
The tools you use compute the unbiased variance and standard deviation! Did you know this before?!
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 27 of 34 ECE 3800
Sampling Theory – The Sample Variance - Proof
The sample variance of the population is defined as
n
i
i XXn
S
1
22 ˆ1
n
i
n
j
ji Xn
Xn
S
1
2
1
2 11
Determining the expected value
n
i
n
jji X
nX
nESE
1
2
1
2 11
n
i
n
jj
n
jjii X
nXX
nX
nESE
1
2
11
22 121
n
i
n
kk
n
jj
n
jjii XX
nEXXE
nXE
nSE
1 112
1
22 121
n
i
n
kk
n
jj
n
i
n
jji
n
ii XX
nE
nXXE
nXE
nSE
1 112
1 12
1
22 1121
n
i
n
j
n
kkj
n
i
XXEnn
XEnXEn
XEnn
SE1 1 1
21
222
22 111
21
n
i
n
j
n
j
n
jkkkjj XXEXE
nnXEnnXEn
nXESE
1 1 1 ,1
2
2
222
22 111
2
n
i
XEnnXEnn
XEn
nXE
nXESE
1
2223
2222 1122
22223
2222 1122XEnnnXEn
nXE
n
nXE
nXESE
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 28 of 34 ECE 3800
n
n
n
nXE
nnXESE
112121 222
n
nXE
n
nXESE
11 222
2222 11
n
nXEXE
n
nSE
Therefore,
22 1
n
nSE
To create an unbiased estimator, scale by an (un-) biasing factor to compute:
222
1
~
SEn
nSE
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 29 of 34 ECE 3800
Statistical Mean and Variance Summary
For taking samples and estimating the mean and variance …
The Estimate Variance of Estimate
Mean
n
iiX X
nX
1
1ˆ̂
An unbiased estimate
XEXE ˆ
XX ˆ
n
XVar X2
ˆ
Variance (biased)
n
i
i XXn
S
1
22 ˆ1
A biased estimate
22 1Xn
nSE
2
442
1
~
n
nSVar X
44 XXE
Variance (unbiased) 222
1
~XSE
n
nSE
An unbiased estimate
22~XXESE
222 ˆ~
XXSE
n
SVar X4
42
44 XXE
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 30 of 34 ECE 3800
5.7 Histograms
Histogramming can be used to determine the values of a pmf! However, a significant number of trials may have to be run before the correct pmf can be observed.
Remember the MATLAB simulation of the marble selection in homework #1?!
Sec1_Marble1.m
Sec1_Marble2.m
Sec1_Marble3.m
See Uniform_hist.m
See Binomial_hist.m
Concepts to validate probability … ground truth, traffic studies, trend analysis
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 31 of 34 ECE 3800
5.8 Entropy and Data Compression
See https://en.wikipedia.org/wiki/Information_theory
https://en.wikipedia.org/wiki/Entropy_(information_theory)
The basis for information theory and of particular benefit data compression involves the concept of entropy.
When evaluating information, a measure of the information content randomness involves the probability of the occurrence of various “letters” in the alphabet and the number of bits actually needed to represent the alphabet.
For the English alphabet, there are m=26 letters. For normal language, each letters has a probability of occurrence.
The measure of the entropy of each potential symbol is
𝐻 𝑋 𝐸 𝑙𝑜𝑔 𝑝 𝑥 𝑝 𝑘 ∙ 𝑙𝑜𝑔 𝑝 𝑘
Typically the log base 2 is used and the entropy can be measured in bits.
If we assume 26 equally likely letters in an alphabet …
𝐻 𝑋 𝐸 𝑙𝑜𝑔 𝑝 𝑥1
26∙ 𝑙𝑜𝑔
126
𝐻 𝑋2626
∙ 𝑙𝑜𝑔1
264.7004
But in reality, the letters have nowhere near equal probability! On-line it was observed that the average English word has an entropy of 2.62
https://www.princeton.edu/~wbialek/rome/refs/shannon_51.pdf
Overall, this is a specific application and discussion related to encoding that is quite involved and very important …. but somewhat unique to an area of interest. Therefore, read it at your leisure ….
Shannon’s Papers on “A mathematical theory of communication”
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6773024&isnumber=6773023
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6773067&isnumber=6773065
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 32 of 34 ECE 3800
Homework Problem 5.5:
Continue the example in Section 5.4 and consider the joint transformation, U = min(X ,Y) (e.g., min(3,2) = 2), and W = max(X ,Y ). For each transformation,
a) What are the level curves (draw pictures)?
b) What are the individual PMF’s of U and W?
c) What is the joint PMF of U and W?
Below are the level curves and PMFs for W = max(X ,Y ) and U = min(X ,Y ):
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 33 of 34 ECE 3800
Homework Problem 5.30:
Prove the Cauchy-Schwarz inequality:
𝑥 ∙ 𝑦 𝑥 ∙ 𝑦
where the x’s and y’s are arbitrary numbers.
Hint: Start with the following inequality (why is this true?):
0 𝑥 𝑎 ∙ 𝑦
Find the value of a that minimizes the right hand side above, substitute that value into the same inequality, and rearrange the terms into the Cauchy-Schwarz inequality at the top.
Notes and figures are based on or taken from materials in the course textbook: Charles Boncelet,
Probability, Statistics, and Random Signals, Oxford University Press, February 2016.
B.J. Bazuin, Spring 2022 34 of 34 ECE 3800
𝑥 ∙ 𝑦 𝑥 ∙ 𝑦
or
0 𝑥 ∙ 𝑦 𝑥 ∙ 𝑦
You may have heard the phrase,” The square of the sum of the product is less than or equal to the product of the sums of the squares!