356B33DAd01

Embed Size (px)

Citation preview

  • 8/13/2019 356B33DAd01

    1/7

    Information and Entropy

    Information uncertainty

    Entropy:

    H(X) = ip(xi) logp(xi)

    Equivalently

    H(X) = E

    log

    1

    p(X)

    Example: Bernoulli r.v. w.p. p

    Entropy is always non-negative

    (why?)

    0 0.2 0.4 0.6 0.8 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    p

    H(p)

    Aria Nosratinia Information Theory 2-1

    Joint and Conditional Entropy

    H(X, Y) =i

    j

    p(xi, yj)logp(xi, yj)

    =EX,Y

    log

    1

    p(X, Y)

    H(Y|X) = i

    H(Y|X=xi)pX(xi)

    = i

    j

    p(yj |xi)logp(yj |xi)

    p(xi)

    = i

    j

    p(xi, yj)logp(yj |xi)

    =EX,Y

    log

    1

    p(Y|X)

    Aria Nosratinia Information Theory 2-2

  • 8/13/2019 356B33DAd01

    2/7

    Chain Rule

    This is one of the most useful information equalities

    H(X, Y) =H(X) +H(Y|X)

    Can you think of an intuitive explanation for it?

    Chain rule for conditional entropies:

    H(X, Y|Z) =H(X|Z) +H(Y|X,Z)

    Chain rule applied multiple times:

    H(X1, . . . ,X n) =H(X1) +H(X2|X1) +. . .+H(Xn|Xn1, . . . ,X 1)

    Aria Nosratinia Information Theory 2-3

    Information Divergence

    Kullback-Leibler distance or information divergence

    D(p||q) =xX

    p(x)logp(x)

    q(x)

    Notes:

    D(p||q) is not symmetric

    D(p||p) = 0

    Represents distance between two distributions

    Can characterize error probability in detection

    Aria Nosratinia Information Theory 2-4

  • 8/13/2019 356B33DAd01

    3/7

    D(p||q) For Brenoulli R.V.

    0

    0.5

    1

    00.2

    0.40.6

    0.810

    2

    4

    6

    8

    10

    pq

    D(p||q)

    Aria Nosratinia Information Theory 2-5

    Mutual Information

    Mutual information: the information of one r.v. about another

    I(X;Y) =Dp(x, y)||p(x)p(y)

    =x,y

    p(x, y)log p(x, y)

    p(x)p(y)

    =EX,Y

    log

    p(X, Y)

    p(X)p(Y)

    I(X;Y) =H(X) H(X|Y) (why?)

    So we define conditional mutual information:

    I(X;Y|Z) =H(X|Z) H(X|Y, Z)

    Chain Rule:

    I(X;Y1, . . . , Y n) =ni=1

    I(X;Yi|Yi1, . . . , Y 1)

    Aria Nosratinia Information Theory 2-6

  • 8/13/2019 356B33DAd01

    4/7

    Entropy Relationships

    H(X) H(Y)

    I(X;Y) H(Y|X)H(X|Y)

    H(X,Y)

    H(X, Y) =H(X) +H(Y|X) I(X;Y) =H(X) H(X|Y)

    =H(Y) +H(X|Y) =H(Y) H(Y|X)

    H(X, Y) =H(X) +H(Y) I(X;Y)

    Aria Nosratinia Information Theory 2-7

    Jensens Inequality

    ConcaveConvex

    Neither

    f() is convex if for any 0 1

    fx1+ (1 )x2 f(x1) + (1 )f(x2)Jensens Inequality: If a function f() is convex, then

    Ef(X)

    f

    E[X]

    Iff() is strictly convex, equality is achieved if and only ifX is trivial.

    Proof: Use induction, definition of convexity, & continuity arguments.

    Aria Nosratinia Information Theory 2-8

  • 8/13/2019 356B33DAd01

    5/7

    Properties of KL and Mutual Information

    D(p||q) 0

    Proof:

    D(p||q) = i

    pilogqi

    pi log

    i

    piqi

    pi= log

    i

    qi= 0

    I(X;Y) 0

    Proof:

    I(X;Y) =Dp(x, y)||p(x)p(y)

    0

    I(X;Y|Z) 0

    Proof: I(X;Y|Z) =Dp(x, y|z)||p(x|z)p(y|z)

    0

    Aria Nosratinia Information Theory 2-9

    Some Inequalities

    H(X1, . . . ,X n)

    iH(Xi)

    (Independence bound)

    Proof: Use chain rule

    H(X) log |X |

    (Uniform distribution maximizes entropy)

    Proof: D(pX ||u) = log(|X |) H(X) 0

    H(X|Y) H(X)

    (Conditioning reduces entropy)

    Proof: H(X) H(X|Y) =I(X;Y) 0

    Aria Nosratinia Information Theory 2-10

  • 8/13/2019 356B33DAd01

    6/7

    Convexity/Concavity of Information Functions

    D(p||q) is convex in the pair (p, q)

    Proof: Uses the log-sum inequality

    H(X) is concave

    I(X;Y) is a convex function of p(y|x) for fixed p(x), and concave

    function ofp(x) for fixed p(y|x).

    Proof: Using the convexity ofD(p||q).

    Aria Nosratinia Information Theory 2-11

    Data Processing Inequality

    X,Y,Zform a Markov chain (shown X Y Z) if

    p(x,y,z) =p(y)p(x|y)p(z|y)

    Then

    I(X;Y) I(X;Z)

    Proof:

    I(X;Y, Z) =I(X;Z) +I(X;Y|Z) = I(X;Y)+

    0

    I(X;Z|Y)In particular X Y g(Y) (why?), then

    I(X;Y) I(X, g(Y)),

    Processing ofY cannot increase the information ofY about X.

    Question: Then why should we ever do signal processing?

    Corollary: I(X;Y|Z) I(X;Y)

    Aria Nosratinia Information Theory 2-12

  • 8/13/2019 356B33DAd01

    7/7

    Fanos Inequality

    Want to estimate X from Y, Pe = P rob(X= X)

    X Y X

    Pelog |X | H(X|Y) H(Pe)

    Sometimes simplified to:

    Pe H(X|Y) 1

    |X |

    Q:Why is this useful?

    A:Shows that there are limits to our ability to communicate or

    estimate well, and the limits have to do with H(X|Y).

    Aria Nosratinia Information Theory 2-13

    Fanos Inequality (proof)

    Consider a Bernoulli R.V. indicating the error, E=IX=X .

    H(E, X|X) =

    H(X|Y)

    H(X|X) +0

    H(E|X, X)=H(E|X)

    H(Pe)

    +H(X|E, X) P

    elog |X |

    Aria Nosratinia Information Theory 2-14