Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Lecture 6: Entropy Rate
• Entropy rate H(X)
• Random walk on graph
Dr. Yao Xie, ECE587, Information Theory, Duke University
Coin tossing versus poker
• Toss a fair coin and see and sequence
Head, Tail, Tail, Head · · ·
(x1, x2, . . . , xn) ≈ 2−nH(X)
• Play card games with friend and see a sequence��
�
A ♣��
�
K r��
�
Q q��
�
J ♠��
�
10 ♣ · · ·
(x1, x2, . . . , xn) ≈?
Dr. Yao Xie, ECE587, Information Theory, Duke University 1
How to model dependence: Markov chain
• A stochastic process X1, X2, · · ·
– State {X1, . . . , Xn}, each state Xi ∈ X– Next step only depends on the previous state
p(xn+1|xn, . . . , x1) = p(xn+1|xn).
– Transition probability
pi, j : the transition probability of i→ j
– p(xn+1) =∑
xn p(xn)p(xn+1|xn)– p(x1, x2, · · · , xn) = p(x1)p(x2|x1) · · · p(xn|xn−1)
Dr. Yao Xie, ECE587, Information Theory, Duke University 2
Hidden Markov model (HMM)
• Used extensively in speech recognition, handwriting recognition,machine learning.
• Markov process X1, X2, . . . , Xn, unobservable
• Observe a random process Y1,Y2, . . . , Yn, such that
Yi ∼ p(yi|xi)
• We can build a probability model
p(xn, yn) = p(x1)n−1∏i=1
p(xi+1|xi)n∏
i=1
p(yi|xi)
Dr. Yao Xie, ECE587, Information Theory, Duke University 3
Time invariance Markov chain
• A Markov chain is time invariant if the conditional probability p(xn|xn−1)does not depend on n
p(Xn+1 = b|Xn = a) = p(X2 = b|X1 = a), for all a, b ∈ X
• For this kind of Markov chain, define transition matrix
P =
P11 · · · P1n· · ·Pn1 · · · Pnn
Dr. Yao Xie, ECE587, Information Theory, Duke University 4
Simple weather model
• X = { Sunny: S, Rainy R }
• p(S|S) = 1 − β, p(R|R) = 1 − α, p(R|S) = β, p(S|R) = α
P =[1 − β βα 1 − α
]
!"
#"
$%!" $%#"
"
Dr. Yao Xie, ECE587, Information Theory, Duke University 5
• Probability of seeing a sequence SSRR:
p(SSRR) = p(S)p(S|S)p(R|S)p(R|R) = p(S)(1 − β)β(1 − α)
What will this sequence behave, after many days of observations?
• What sequences of observations are more typical?
• What is the probability of seeing a typical sequence?
Dr. Yao Xie, ECE587, Information Theory, Duke University 6
Stationary distribution
• Stationary distribution: a distribution µ on the states such that thedistribution at time n + 1 is the same as the distribution at time n.
• Our weather example:– If µ(S) = α
α+β, µ(R) = β
α+β
P =[1 − β βα 1 − α
]– Then
p(Xn+1 = S) = p(S|S)µ(S) + p(S|R)µ(R)
= (1 − β) αα + β
+ αβ
α + β=α
α + β= µ(S ).
Dr. Yao Xie, ECE587, Information Theory, Duke University 7
• How to calculate stationary distribution
– Stationary distribution µi, i = 1, · · · , |X| satisfies
µi =∑
j
µ jp ji, (µ = µP), and|X|∑i=1
µi = 1.
– “Detailed balancing”:
!"
#!"
$%!"#%"
$&!"#&"
Dr. Yao Xie, ECE587, Information Theory, Duke University 8
Stationary process
• A stochastic process is stationary if the joint distribution of any subset isinvariant to time-shift
p(X1 = x1, · · · , Xn = xn) = p(X2 = x1, · · · , Xn+1 = xn).
• Example: coin tossing
p(X1 = head, X2 = tail) = p(X2 = head, X3 = tail) = p(1 − p).
Dr. Yao Xie, ECE587, Information Theory, Duke University 9
Entropy rate
• When Xi are i.i.d., entropy H(Xn) = H(X1, · · · , Xn) =∑n
i=1 H(Xi) = nH(X)
• With dependent sequence Xi, how does H(Xn) grow with n? Still linear?
• Entropy rate characterizes the growth rate
• Definition 1: average entropy per symbol
H(X) = limn→∞
H(Xn)n
• Definition 2: rate of information innovation
H′(X) = limn→∞
H(Xn|Xn−1, · · · , X1)
Dr. Yao Xie, ECE587, Information Theory, Duke University 10
• H′(X) exists, for Xi stationary
H(Xn|X1, · · · , Xn−1) ≤ H(Xn|X2, · · · , Xn−1) (1)≤ H(Xn−1|X1, · · · , Xn−2) (2)
– H(Xn|X1, · · · , Xn−1) decreases as n increases– H(X) ≥ 0– The limit must exist
Dr. Yao Xie, ECE587, Information Theory, Duke University 11
• H(X) = H′(X), for Xi stationary
1n
H(X1, · · · , Xn) =1n
n∑i=1
H(Xi|Xi−1, · · · , X1)
• Each H(Xn|X1, · · · , Xn−1)→ H′(X)
• Cesaro mean:
If an → a, bn = 1n∑n
i=1 ai, bi → a, then bn → a.
• So1n
H(X1, · · · , Xn)→ H′(X)
Dr. Yao Xie, ECE587, Information Theory, Duke University 12
AEP for stationary process
−1n
log p(X1, · · · , Xn)→ H(X)
• p(X1, · · · , Xn) ≈ 2−nH(X)
• Typical sequences in typical set of size 2−nH(X)
• We can use nH(X) bits to represent typical sequence
Dr. Yao Xie, ECE587, Information Theory, Duke University 13
Entropy rate for Markov chain
• For Markov chain
H(X) = lim H(Xn|Xn−1, · · · , X1) = lim H(Xn|Xn−1) = H(X2|X1)
• By definitionp(X2 = j|X1 = i) = Pi j
• Entropy rate of Markov chain
H(X) = −∑
i j
µiPi j log Pi j
Dr. Yao Xie, ECE587, Information Theory, Duke University 14
Calculate entropy rate is fairly easy
1. Find stationary distribution µi
2. Use transition probability Pi j
H(X) = −∑
i j
µiPi j log Pi j
Dr. Yao Xie, ECE587, Information Theory, Duke University 15
Entropy rate of weather modelStationary distribution µ(S) = α
α+β, µ(R) = β
α+β
P =[1 − β βα 1 − α
]
H(X) = βα + β
[α logα + (1 − α) log(1 − α)]
=α
α + βH(β) +
β
α + βH(α)
≤ H(2αβ
α + β
)≤ H
( √αβ
)Maximum when α = β = 1/2: degenerate to independent process
Dr. Yao Xie, ECE587, Information Theory, Duke University 16
Random walk on graph
• An undirected graph with m nodes {1, . . . ,m}
• Edge i→ j has weight Wi j ≥ 0 (Wi j = W ji)
• A particle walks randomly from node to node
• Random walk X1, X2, · · · : a sequence of vertices
• Given Xn = i, next step chosen from neighboring nodes with probability
Pi j =Wi j∑k Wik
Dr. Yao Xie, ECE587, Information Theory, Duke University 17
Dr. Yao Xie, ECE587, Information Theory, Duke University 18
Entropy rate of random walk on graph
• LetWi =
∑j
Wi j, W =∑
i, j:i> j
Wi j
• Stationary distribution isµi =
Wi2W
• Can verify this is a stationary distribution: µP = µ
• Stationary distribution ∝ weight of edges emanating from node i(locality)
Dr. Yao Xie, ECE587, Information Theory, Duke University 19
Dr. Yao Xie, ECE587, Information Theory, Duke University 20
Summary
• AEP Stationary process X1, X2, · · · , Xi ∈ X: as n→ ∞
p(x1, · · · , xn) ≈ 2−nH(X)
• Entropy rate
H(X) = limn→∞
H(Xn|Xn−1, . . . , X1) =1n
limn→∞
H(X1, . . . , Xn)
• Random walk on graphµi =
Wi2W
Dr. Yao Xie, ECE587, Information Theory, Duke University 21