Download pdf - Lecture 5

Lecture 5: Asymptotic Equipartition Property

Law of large number for product of random variables AEP and consequences

Dr. Yao Xie, ECE587, Information Theory, Duke University

Stock market

Initial investment Y0, daily return ratio ri, in t-th day, your money is

Yt = Y0r1 : : : rt:

Now if returns ratio ri are i.i.d., with

ri =

4; w.p. 1=20; w.p. 1=2:

So you think the expected return ratio is Eri = 2,

and thenEYt = E(Y0r1 : : : rt) = Y0(Eri)t = Y02t?

Dr. Yao Xie, ECE587, Information Theory, Duke University 1

Is \optimized" really optimal?

With Y0 = 1, actual return Yt goes like

1 4 16 0 0 0 : : :

Optimize expected return is not optimal?

Fundamental reason: products does not behave the same as addition


(Weak) Law of large numberTheorem. For independent, identically distributed (i.i.d.) randomvariables Xi,

Xn =1

n

nXi=1

Xi ! EX; in probability.

Convergence in probability if for every > 0,

PfjXn Xj > g ! 0:

Proof by Markov inequality.

So this means

Pfj Xn EXj g ! 1; n!1:


Other types of convergence

In mean square if as n!1

E(Xn X)2 ! 0

With probability 1 (almost surely) if as n!1

Pnlimn!1Xn = X

o= 1

In distribution if as n!1

limnFn ! F;

where Fn and F are the cumulative distribution function of Xn and X.


Product of random variables

How does this behave?n

vuut nYi=1

Xi

Geometric mean npQni=1Xi arithmetic mean 1nPni=1Xi Examples:{ Volume V of a random box, each dimension Xi, V = X1 : : : Xn{ Stock return Yt = Y0r1 : : : rt{ Joint distribution of i.i.d. RVs: p(x1; : : : ; xn) =

Qni=1 p(xi)


Law of large number for product of random variables

We can writeXi = e

logXi

Hencen

vuut nYi=1

Xi = e1n

Pni=1 logXi

So from LLN

n

vuut nYi=1

Xi ! eE(logX) elogEX = EX:


Stock example:

E log ri =1

2log 4 +

1

2log 0 = 1

E(Yt)! Y0eE log ri = 0; t!1:

ExampleX =

a; w.p. 1=2b; w.p. 1=2:

E

8

Asymptotic equipartition property (AEP)

LLN states that1

n

nXi=1

Xi ! EX

AEP states that most sequences

1

nlog

1

p(X1; X2; : : : ; Xn)! H(X)

p(X1; X2; : : : ; Xn) 2nH(X)

Analyze using LLN for product of random variables


AEP lies in the heart of information theory.

Proof for lossless source coding

Proof for channel capacity

and more...


AEP

Theorem. If X1; X2; : : : are i.i.d. p(x), then

1nlog p(X1; X2; : : : ; Xn)! H(X); in probability.

Proof:

1nlog p(X1; X2; ; Xn) = 1

n

nXi=1

log p(Xi)

! E log p(X)= H(X):

There are several consequences.


Typical set

A typical set

A(n)

contains all sequences (x1; x2; : : : ; xn) 2 Xn with the property

2n(H(X)+) p(x1; x2; : : : ; xn) 2n(H(X)):


Not all sequences are created equal

Coin tossing example: X 2 f0; 1g, p(1) = 0:8

p(1; 0; 1; 1; 0; 1) = pP

Xi(1 p)5P

Xi = p4(1 p)2 = 0:0164

p(0; 0; 0; 0; 0; 0) = pP

Xi(1 p)5P

Xi = p4(1 p)2 = 0:000064

In this example, if(x1; : : : ; xn) 2 A(n) ;

H(X) 1nlog p(X1; : : : ; Xn) H(X) + :

This means a binary sequence is in typical set is the frequency of headsis approximately k=n


p = 0:6, n = 25, k = number of \1"s

k

(n

k

) (n

k

)pk(1 p)nk

1n

log p(xn)

0 1 0.000000 1.3219281 25 0.000000 1.2985302 300 0.000000 1.2751313 2300 0.000001 1.2517334 12650 0.000007 1.2283345 53130 0.000054 1.2049366 177100 0.000227 1.1815377 480700 0.001205 1.1581398 1081575 0.003121 1.1347409 2042975 0.013169 1.111342

10 3268760 0.021222 1.08794311 4457400 0.077801 1.06454512 5200300 0.075967 1.04114613 5200300 0.267718 1.01774814 4457400 0.146507 0.99434915 3268760 0.575383 0.97095116 2042975 0.151086 0.94755217 1081575 0.846448 0.92415418 480700 0.079986 0.90075519 177100 0.970638 0.87735720 53130 0.019891 0.85395821 12650 0.997633 0.83056022 2300 0.001937 0.80716123 300 0.999950 0.78376324 25 0.000047 0.76036425 1 0.000003 0.736966


Consequences of AEP

Theorem.1. If (x1; x2; : : : ; xn) 2 A(n) , then for n suciently large:

H(X) 1nlog p(x1; x2; : : : ; xn) H(X) +

2. PfA(n) g 1 .

3. jA(n) j 2n(H(X)+).

4. jA(n) j (1 )2n(H(X)).


Property 1

If (x1; x2; : : : ; xn) 2 A(n) , then

H(X) 1nlog p(x1; x2; : : : ; xn) H(X) + :

Proof from denition:

(x1; x2; : : : ; xn) 2 A(n) ;

if2n(H(X)+) p(x1; x2; : : : ; xn) 2n(H(X)):

The number of bits used to describe sequences in typical set isapproximately nH(X).


Property 2

PfA(n) g 1 for n suciently large. Proof: From AEP: because

1nlog p(X1; : : : ; Xn)! H(X)

in probability, this means for a given > 0, when n is suciently large

pf1n log p(X1; : : : ; Xn)H(X)

| {z }2A(n)

g 1 :

{ High probability: sequences in typical set are \most typical".{ These sequences almost all have same probability - \equipartition".


Property 3 and 4: size of typical set

(1 )2n(H(X)) jA(n) j 2n(H(X)+) Proof:

1 =X

(x1;:::;xn)

p(x1; : : : ; xn)

X

(x1;:::;xn)2A(n)

p(x1; : : : ; xn)

X

(x1;:::;xn)2A(n)

p(x1; : : : ; xn)2n(H(X)+)

= jA(n) j2n(H(X)+):


On the other hand, PfA(n) g 1 for n, so

1