Upload
mailstonaik
View
219
Download
0
Embed Size (px)
Citation preview
8/13/2019 356B33DAd01
1/7
Information and Entropy
Information uncertainty
Entropy:
H(X) = ip(xi) logp(xi)
Equivalently
H(X) = E
log
1
p(X)
Example: Bernoulli r.v. w.p. p
Entropy is always non-negative
(why?)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
H(p)
Aria Nosratinia Information Theory 2-1
Joint and Conditional Entropy
H(X, Y) =i
j
p(xi, yj)logp(xi, yj)
=EX,Y
log
1
p(X, Y)
H(Y|X) = i
H(Y|X=xi)pX(xi)
= i
j
p(yj |xi)logp(yj |xi)
p(xi)
= i
j
p(xi, yj)logp(yj |xi)
=EX,Y
log
1
p(Y|X)
Aria Nosratinia Information Theory 2-2
8/13/2019 356B33DAd01
2/7
Chain Rule
This is one of the most useful information equalities
H(X, Y) =H(X) +H(Y|X)
Can you think of an intuitive explanation for it?
Chain rule for conditional entropies:
H(X, Y|Z) =H(X|Z) +H(Y|X,Z)
Chain rule applied multiple times:
H(X1, . . . ,X n) =H(X1) +H(X2|X1) +. . .+H(Xn|Xn1, . . . ,X 1)
Aria Nosratinia Information Theory 2-3
Information Divergence
Kullback-Leibler distance or information divergence
D(p||q) =xX
p(x)logp(x)
q(x)
Notes:
D(p||q) is not symmetric
D(p||p) = 0
Represents distance between two distributions
Can characterize error probability in detection
Aria Nosratinia Information Theory 2-4
8/13/2019 356B33DAd01
3/7
D(p||q) For Brenoulli R.V.
0
0.5
1
00.2
0.40.6
0.810
2
4
6
8
10
pq
D(p||q)
Aria Nosratinia Information Theory 2-5
Mutual Information
Mutual information: the information of one r.v. about another
I(X;Y) =Dp(x, y)||p(x)p(y)
=x,y
p(x, y)log p(x, y)
p(x)p(y)
=EX,Y
log
p(X, Y)
p(X)p(Y)
I(X;Y) =H(X) H(X|Y) (why?)
So we define conditional mutual information:
I(X;Y|Z) =H(X|Z) H(X|Y, Z)
Chain Rule:
I(X;Y1, . . . , Y n) =ni=1
I(X;Yi|Yi1, . . . , Y 1)
Aria Nosratinia Information Theory 2-6
8/13/2019 356B33DAd01
4/7
Entropy Relationships
H(X) H(Y)
I(X;Y) H(Y|X)H(X|Y)
H(X,Y)
H(X, Y) =H(X) +H(Y|X) I(X;Y) =H(X) H(X|Y)
=H(Y) +H(X|Y) =H(Y) H(Y|X)
H(X, Y) =H(X) +H(Y) I(X;Y)
Aria Nosratinia Information Theory 2-7
Jensens Inequality
ConcaveConvex
Neither
f() is convex if for any 0 1
fx1+ (1 )x2 f(x1) + (1 )f(x2)Jensens Inequality: If a function f() is convex, then
Ef(X)
f
E[X]
Iff() is strictly convex, equality is achieved if and only ifX is trivial.
Proof: Use induction, definition of convexity, & continuity arguments.
Aria Nosratinia Information Theory 2-8
8/13/2019 356B33DAd01
5/7
Properties of KL and Mutual Information
D(p||q) 0
Proof:
D(p||q) = i
pilogqi
pi log
i
piqi
pi= log
i
qi= 0
I(X;Y) 0
Proof:
I(X;Y) =Dp(x, y)||p(x)p(y)
0
I(X;Y|Z) 0
Proof: I(X;Y|Z) =Dp(x, y|z)||p(x|z)p(y|z)
0
Aria Nosratinia Information Theory 2-9
Some Inequalities
H(X1, . . . ,X n)
iH(Xi)
(Independence bound)
Proof: Use chain rule
H(X) log |X |
(Uniform distribution maximizes entropy)
Proof: D(pX ||u) = log(|X |) H(X) 0
H(X|Y) H(X)
(Conditioning reduces entropy)
Proof: H(X) H(X|Y) =I(X;Y) 0
Aria Nosratinia Information Theory 2-10
8/13/2019 356B33DAd01
6/7
Convexity/Concavity of Information Functions
D(p||q) is convex in the pair (p, q)
Proof: Uses the log-sum inequality
H(X) is concave
I(X;Y) is a convex function of p(y|x) for fixed p(x), and concave
function ofp(x) for fixed p(y|x).
Proof: Using the convexity ofD(p||q).
Aria Nosratinia Information Theory 2-11
Data Processing Inequality
X,Y,Zform a Markov chain (shown X Y Z) if
p(x,y,z) =p(y)p(x|y)p(z|y)
Then
I(X;Y) I(X;Z)
Proof:
I(X;Y, Z) =I(X;Z) +I(X;Y|Z) = I(X;Y)+
0
I(X;Z|Y)In particular X Y g(Y) (why?), then
I(X;Y) I(X, g(Y)),
Processing ofY cannot increase the information ofY about X.
Question: Then why should we ever do signal processing?
Corollary: I(X;Y|Z) I(X;Y)
Aria Nosratinia Information Theory 2-12
8/13/2019 356B33DAd01
7/7
Fanos Inequality
Want to estimate X from Y, Pe = P rob(X= X)
X Y X
Pelog |X | H(X|Y) H(Pe)
Sometimes simplified to:
Pe H(X|Y) 1
|X |
Q:Why is this useful?
A:Shows that there are limits to our ability to communicate or
estimate well, and the limits have to do with H(X|Y).
Aria Nosratinia Information Theory 2-13
Fanos Inequality (proof)
Consider a Bernoulli R.V. indicating the error, E=IX=X .
H(E, X|X) =
H(X|Y)
H(X|X) +0
H(E|X, X)=H(E|X)
H(Pe)
+H(X|E, X) P
elog |X |
Aria Nosratinia Information Theory 2-14