Lecture 5

Embed Size (px)

DESCRIPTION

It is a very good lecture with respect to AMP

Citation preview

  • Lecture 5: Asymptotic Equipartition Property

    Law of large number for product of random variables AEP and consequences

    Dr. Yao Xie, ECE587, Information Theory, Duke University

  • Stock market

    Initial investment Y0, daily return ratio ri, in t-th day, your money is

    Yt = Y0r1 : : : rt:

    Now if returns ratio ri are i.i.d., with

    ri =

    4; w.p. 1=20; w.p. 1=2:

    So you think the expected return ratio is Eri = 2,

    and thenEYt = E(Y0r1 : : : rt) = Y0(Eri)t = Y02t?

    Dr. Yao Xie, ECE587, Information Theory, Duke University 1

  • Is \optimized" really optimal?

    With Y0 = 1, actual return Yt goes like

    1 4 16 0 0 0 : : :

    Optimize expected return is not optimal?

    Fundamental reason: products does not behave the same as addition

    Dr. Yao Xie, ECE587, Information Theory, Duke University 2

  • (Weak) Law of large numberTheorem. For independent, identically distributed (i.i.d.) randomvariables Xi,

    Xn =1

    n

    nXi=1

    Xi ! EX; in probability.

    Convergence in probability if for every > 0,

    PfjXn Xj > g ! 0:

    Proof by Markov inequality.

    So this means

    Pfj Xn EXj g ! 1; n!1:

    Dr. Yao Xie, ECE587, Information Theory, Duke University 3

  • Other types of convergence

    In mean square if as n!1

    E(Xn X)2 ! 0

    With probability 1 (almost surely) if as n!1

    Pnlimn!1Xn = X

    o= 1

    In distribution if as n!1

    limnFn ! F;

    where Fn and F are the cumulative distribution function of Xn and X.

    Dr. Yao Xie, ECE587, Information Theory, Duke University 4

  • Product of random variables

    How does this behave?n

    vuut nYi=1

    Xi

    Geometric mean npQni=1Xi arithmetic mean 1nPni=1Xi Examples:{ Volume V of a random box, each dimension Xi, V = X1 : : : Xn{ Stock return Yt = Y0r1 : : : rt{ Joint distribution of i.i.d. RVs: p(x1; : : : ; xn) =

    Qni=1 p(xi)

    Dr. Yao Xie, ECE587, Information Theory, Duke University 5

  • Law of large number for product of random variables

    We can writeXi = e

    logXi

    Hencen

    vuut nYi=1

    Xi = e1n

    Pni=1 logXi

    So from LLN

    n

    vuut nYi=1

    Xi ! eE(logX) elogEX = EX:

    Dr. Yao Xie, ECE587, Information Theory, Duke University 6

  • Stock example:

    E log ri =1

    2log 4 +

    1

    2log 0 = 1

    E(Yt)! Y0eE log ri = 0; t!1:

    ExampleX =

    a; w.p. 1=2b; w.p. 1=2:

    E

    8

  • Asymptotic equipartition property (AEP)

    LLN states that1

    n

    nXi=1

    Xi ! EX

    AEP states that most sequences

    1

    nlog

    1

    p(X1; X2; : : : ; Xn)! H(X)

    p(X1; X2; : : : ; Xn) 2nH(X)

    Analyze using LLN for product of random variables

    Dr. Yao Xie, ECE587, Information Theory, Duke University 8

  • AEP lies in the heart of information theory.

    Proof for lossless source coding

    Proof for channel capacity

    and more...

    Dr. Yao Xie, ECE587, Information Theory, Duke University 9

  • AEP

    Theorem. If X1; X2; : : : are i.i.d. p(x), then

    1nlog p(X1; X2; : : : ; Xn)! H(X); in probability.

    Proof:

    1nlog p(X1; X2; ; Xn) = 1

    n

    nXi=1

    log p(Xi)

    ! E log p(X)= H(X):

    There are several consequences.

    Dr. Yao Xie, ECE587, Information Theory, Duke University 10

  • Typical set

    A typical set

    A(n)

    contains all sequences (x1; x2; : : : ; xn) 2 Xn with the property

    2n(H(X)+) p(x1; x2; : : : ; xn) 2n(H(X)):

    Dr. Yao Xie, ECE587, Information Theory, Duke University 11

  • Not all sequences are created equal

    Coin tossing example: X 2 f0; 1g, p(1) = 0:8

    p(1; 0; 1; 1; 0; 1) = pP

    Xi(1 p)5P

    Xi = p4(1 p)2 = 0:0164

    p(0; 0; 0; 0; 0; 0) = pP

    Xi(1 p)5P

    Xi = p4(1 p)2 = 0:000064

    In this example, if(x1; : : : ; xn) 2 A(n) ;

    H(X) 1nlog p(X1; : : : ; Xn) H(X) + :

    This means a binary sequence is in typical set is the frequency of headsis approximately k=n

    Dr. Yao Xie, ECE587, Information Theory, Duke University 12

  • Dr. Yao Xie, ECE587, Information Theory, Duke University 13

  • p = 0:6, n = 25, k = number of \1"s

    k

    (n

    k

    ) (n

    k

    )pk(1 p)nk

    1n

    log p(xn)

    0 1 0.000000 1.3219281 25 0.000000 1.2985302 300 0.000000 1.2751313 2300 0.000001 1.2517334 12650 0.000007 1.2283345 53130 0.000054 1.2049366 177100 0.000227 1.1815377 480700 0.001205 1.1581398 1081575 0.003121 1.1347409 2042975 0.013169 1.111342

    10 3268760 0.021222 1.08794311 4457400 0.077801 1.06454512 5200300 0.075967 1.04114613 5200300 0.267718 1.01774814 4457400 0.146507 0.99434915 3268760 0.575383 0.97095116 2042975 0.151086 0.94755217 1081575 0.846448 0.92415418 480700 0.079986 0.90075519 177100 0.970638 0.87735720 53130 0.019891 0.85395821 12650 0.997633 0.83056022 2300 0.001937 0.80716123 300 0.999950 0.78376324 25 0.000047 0.76036425 1 0.000003 0.736966

    Dr. Yao Xie, ECE587, Information Theory, Duke University 14

  • Consequences of AEP

    Theorem.1. If (x1; x2; : : : ; xn) 2 A(n) , then for n suciently large:

    H(X) 1nlog p(x1; x2; : : : ; xn) H(X) +

    2. PfA(n) g 1 .

    3. jA(n) j 2n(H(X)+).

    4. jA(n) j (1 )2n(H(X)).

    Dr. Yao Xie, ECE587, Information Theory, Duke University 15

  • Property 1

    If (x1; x2; : : : ; xn) 2 A(n) , then

    H(X) 1nlog p(x1; x2; : : : ; xn) H(X) + :

    Proof from denition:

    (x1; x2; : : : ; xn) 2 A(n) ;

    if2n(H(X)+) p(x1; x2; : : : ; xn) 2n(H(X)):

    The number of bits used to describe sequences in typical set isapproximately nH(X).

    Dr. Yao Xie, ECE587, Information Theory, Duke University 16

  • Property 2

    PfA(n) g 1 for n suciently large. Proof: From AEP: because

    1nlog p(X1; : : : ; Xn)! H(X)

    in probability, this means for a given > 0, when n is suciently large

    pf1n log p(X1; : : : ; Xn)H(X)

    | {z }2A(n)

    g 1 :

    { High probability: sequences in typical set are \most typical".{ These sequences almost all have same probability - \equipartition".

    Dr. Yao Xie, ECE587, Information Theory, Duke University 17

  • Property 3 and 4: size of typical set

    (1 )2n(H(X)) jA(n) j 2n(H(X)+) Proof:

    1 =X

    (x1;:::;xn)

    p(x1; : : : ; xn)

    X

    (x1;:::;xn)2A(n)

    p(x1; : : : ; xn)

    X

    (x1;:::;xn)2A(n)

    p(x1; : : : ; xn)2n(H(X)+)

    = jA(n) j2n(H(X)+):

    Dr. Yao Xie, ECE587, Information Theory, Duke University 18

  • On the other hand, PfA(n) g 1 for n, so

    1