Lecture 5: Asymptotic Equipartition Property
Law of large number for product of random variables AEP and consequences
Dr. Yao Xie, ECE587, Information Theory, Duke University
Stock market
Initial investment Y0, daily return ratio ri, in t-th day, your money is
Yt = Y0r1 : : : rt:
Now if returns ratio ri are i.i.d., with
ri =
4; w.p. 1=20; w.p. 1=2:
So you think the expected return ratio is Eri = 2,
and thenEYt = E(Y0r1 : : : rt) = Y0(Eri)t = Y02t?
Dr. Yao Xie, ECE587, Information Theory, Duke University 1
Is \optimized" really optimal?
With Y0 = 1, actual return Yt goes like
1 4 16 0 0 0 : : :
Optimize expected return is not optimal?
Fundamental reason: products does not behave the same as addition
Dr. Yao Xie, ECE587, Information Theory, Duke University 2
(Weak) Law of large numberTheorem. For independent, identically distributed (i.i.d.) randomvariables Xi,
Xn =1
n
nXi=1
Xi ! EX; in probability.
Convergence in probability if for every > 0,
PfjXn Xj > g ! 0:
Proof by Markov inequality.
So this means
Pfj Xn EXj g ! 1; n!1:
Dr. Yao Xie, ECE587, Information Theory, Duke University 3
Other types of convergence
In mean square if as n!1
E(Xn X)2 ! 0
With probability 1 (almost surely) if as n!1
Pnlimn!1Xn = X
o= 1
In distribution if as n!1
limnFn ! F;
where Fn and F are the cumulative distribution function of Xn and X.
Dr. Yao Xie, ECE587, Information Theory, Duke University 4
Product of random variables
How does this behave?n
vuut nYi=1
Xi
Geometric mean npQni=1Xi arithmetic mean 1nPni=1Xi Examples:{ Volume V of a random box, each dimension Xi, V = X1 : : : Xn{ Stock return Yt = Y0r1 : : : rt{ Joint distribution of i.i.d. RVs: p(x1; : : : ; xn) =
Qni=1 p(xi)
Dr. Yao Xie, ECE587, Information Theory, Duke University 5
Law of large number for product of random variables
We can writeXi = e
logXi
Hencen
vuut nYi=1
Xi = e1n
Pni=1 logXi
So from LLN
n
vuut nYi=1
Xi ! eE(logX) elogEX = EX:
Dr. Yao Xie, ECE587, Information Theory, Duke University 6
Stock example:
E log ri =1
2log 4 +
1
2log 0 = 1
E(Yt)! Y0eE log ri = 0; t!1:
ExampleX =
a; w.p. 1=2b; w.p. 1=2:
E
8
Asymptotic equipartition property (AEP)
LLN states that1
n
nXi=1
Xi ! EX
AEP states that most sequences
1
nlog
1
p(X1; X2; : : : ; Xn)! H(X)
p(X1; X2; : : : ; Xn) 2nH(X)
Analyze using LLN for product of random variables
Dr. Yao Xie, ECE587, Information Theory, Duke University 8
AEP lies in the heart of information theory.
Proof for lossless source coding
Proof for channel capacity
and more...
Dr. Yao Xie, ECE587, Information Theory, Duke University 9
AEP
Theorem. If X1; X2; : : : are i.i.d. p(x), then
1nlog p(X1; X2; : : : ; Xn)! H(X); in probability.
Proof:
1nlog p(X1; X2; ; Xn) = 1
n
nXi=1
log p(Xi)
! E log p(X)= H(X):
There are several consequences.
Dr. Yao Xie, ECE587, Information Theory, Duke University 10
Typical set
A typical set
A(n)
contains all sequences (x1; x2; : : : ; xn) 2 Xn with the property
2n(H(X)+) p(x1; x2; : : : ; xn) 2n(H(X)):
Dr. Yao Xie, ECE587, Information Theory, Duke University 11
Not all sequences are created equal
Coin tossing example: X 2 f0; 1g, p(1) = 0:8
p(1; 0; 1; 1; 0; 1) = pP
Xi(1 p)5P
Xi = p4(1 p)2 = 0:0164
p(0; 0; 0; 0; 0; 0) = pP
Xi(1 p)5P
Xi = p4(1 p)2 = 0:000064
In this example, if(x1; : : : ; xn) 2 A(n) ;
H(X) 1nlog p(X1; : : : ; Xn) H(X) + :
This means a binary sequence is in typical set is the frequency of headsis approximately k=n
Dr. Yao Xie, ECE587, Information Theory, Duke University 12
Dr. Yao Xie, ECE587, Information Theory, Duke University 13
p = 0:6, n = 25, k = number of \1"s
k
(n
k
) (n
k
)pk(1 p)nk
1n
log p(xn)
0 1 0.000000 1.3219281 25 0.000000 1.2985302 300 0.000000 1.2751313 2300 0.000001 1.2517334 12650 0.000007 1.2283345 53130 0.000054 1.2049366 177100 0.000227 1.1815377 480700 0.001205 1.1581398 1081575 0.003121 1.1347409 2042975 0.013169 1.111342
10 3268760 0.021222 1.08794311 4457400 0.077801 1.06454512 5200300 0.075967 1.04114613 5200300 0.267718 1.01774814 4457400 0.146507 0.99434915 3268760 0.575383 0.97095116 2042975 0.151086 0.94755217 1081575 0.846448 0.92415418 480700 0.079986 0.90075519 177100 0.970638 0.87735720 53130 0.019891 0.85395821 12650 0.997633 0.83056022 2300 0.001937 0.80716123 300 0.999950 0.78376324 25 0.000047 0.76036425 1 0.000003 0.736966
Dr. Yao Xie, ECE587, Information Theory, Duke University 14
Consequences of AEP
Theorem.1. If (x1; x2; : : : ; xn) 2 A(n) , then for n suciently large:
H(X) 1nlog p(x1; x2; : : : ; xn) H(X) +
2. PfA(n) g 1 .
3. jA(n) j 2n(H(X)+).
4. jA(n) j (1 )2n(H(X)).
Dr. Yao Xie, ECE587, Information Theory, Duke University 15
Property 1
If (x1; x2; : : : ; xn) 2 A(n) , then
H(X) 1nlog p(x1; x2; : : : ; xn) H(X) + :
Proof from denition:
(x1; x2; : : : ; xn) 2 A(n) ;
if2n(H(X)+) p(x1; x2; : : : ; xn) 2n(H(X)):
The number of bits used to describe sequences in typical set isapproximately nH(X).
Dr. Yao Xie, ECE587, Information Theory, Duke University 16
Property 2
PfA(n) g 1 for n suciently large. Proof: From AEP: because
1nlog p(X1; : : : ; Xn)! H(X)
in probability, this means for a given > 0, when n is suciently large
pf1n log p(X1; : : : ; Xn)H(X)
| {z }2A(n)
g 1 :
{ High probability: sequences in typical set are \most typical".{ These sequences almost all have same probability - \equipartition".
Dr. Yao Xie, ECE587, Information Theory, Duke University 17
Property 3 and 4: size of typical set
(1 )2n(H(X)) jA(n) j 2n(H(X)+) Proof:
1 =X
(x1;:::;xn)
p(x1; : : : ; xn)
X
(x1;:::;xn)2A(n)
p(x1; : : : ; xn)
X
(x1;:::;xn)2A(n)
p(x1; : : : ; xn)2n(H(X)+)
= jA(n) j2n(H(X)+):
Dr. Yao Xie, ECE587, Information Theory, Duke University 18
On the other hand, PfA(n) g 1 for n, so
1