Lecture 6: Entropy Rateyxie77/ece587/Lecture6.pdf · Entropy rate of random walk on graph Let Wi = ∑ j Wij; W = ∑ i;j:i>j Wij Stationary distribution is i = Wi 2W Can verify this

Lecture 6: Entropy Rate

• Entropy rate H(X)

• Random walk on graph

Dr. Yao Xie, ECE587, Information Theory, Duke University

Coin tossing versus poker

• Toss a fair coin and see and sequence

Head, Tail, Tail, Head · · ·

(x1, x2, . . . , xn) ≈ 2−nH(X)

• Play card games with friend and see a sequence��

�

A ♣��

�

K r��

�

Q q��

�

J ♠��

�

10 ♣ · · ·

(x1, x2, . . . , xn) ≈?

Dr. Yao Xie, ECE587, Information Theory, Duke University 1

How to model dependence: Markov chain

• A stochastic process X1, X2, · · ·

– State {X1, . . . , Xn}, each state Xi ∈ X– Next step only depends on the previous state

p(xn+1|xn, . . . , x1) = p(xn+1|xn).

– Transition probability

pi, j : the transition probability of i→ j

– p(xn+1) =∑

xn p(xn)p(xn+1|xn)– p(x1, x2, · · · , xn) = p(x1)p(x2|x1) · · · p(xn|xn−1)


Hidden Markov model (HMM)

• Used extensively in speech recognition, handwriting recognition,machine learning.

• Markov process X1, X2, . . . , Xn, unobservable

• Observe a random process Y1,Y2, . . . , Yn, such that

Yi ∼ p(yi|xi)

• We can build a probability model

p(xn, yn) = p(x1)n−1∏i=1

p(xi+1|xi)n∏

i=1

p(yi|xi)


Time invariance Markov chain

• A Markov chain is time invariant if the conditional probability p(xn|xn−1)does not depend on n

p(Xn+1 = b|Xn = a) = p(X2 = b|X1 = a), for all a, b ∈ X

• For this kind of Markov chain, define transition matrix

P =

P11 · · · P1n· · ·Pn1 · · · Pnn


Simple weather model

• X = { Sunny: S, Rainy R }

• p(S|S) = 1 − β, p(R|R) = 1 − α, p(R|S) = β, p(S|R) = α

P =[1 − β βα 1 − α

]

!"

#"

$%!" $%#"

"


• Probability of seeing a sequence SSRR:

p(SSRR) = p(S)p(S|S)p(R|S)p(R|R) = p(S)(1 − β)β(1 − α)

What will this sequence behave, after many days of observations?

• What sequences of observations are more typical?

• What is the probability of seeing a typical sequence?


Stationary distribution

• Stationary distribution: a distribution µ on the states such that thedistribution at time n + 1 is the same as the distribution at time n.

• Our weather example:– If µ(S) = α

α+β, µ(R) = β

α+β

P =[1 − β βα 1 − α

]– Then

p(Xn+1 = S) = p(S|S)µ(S) + p(S|R)µ(R)

= (1 − β) αα + β

+ αβ

α + β=α

α + β= µ(S ).


• How to calculate stationary distribution

– Stationary distribution µi, i = 1, · · · , |X| satisfies

µi =∑

j

µ jp ji, (µ = µP), and|X|∑i=1

µi = 1.

– “Detailed balancing”:

!"

#!"

$%!"#%"

$&!"#&"


Stationary process

• A stochastic process is stationary if the joint distribution of any subset isinvariant to time-shift

p(X1 = x1, · · · , Xn = xn) = p(X2 = x1, · · · , Xn+1 = xn).

• Example: coin tossing

p(X1 = head, X2 = tail) = p(X2 = head, X3 = tail) = p(1 − p).


Entropy rate

• When Xi are i.i.d., entropy H(Xn) = H(X1, · · · , Xn) =∑n

i=1 H(Xi) = nH(X)

• With dependent sequence Xi, how does H(Xn) grow with n? Still linear?

• Entropy rate characterizes the growth rate

• Definition 1: average entropy per symbol

H(X) = limn→∞

H(Xn)n

• Definition 2: rate of information innovation

H′(X) = limn→∞

H(Xn|Xn−1, · · · , X1)


• H′(X) exists, for Xi stationary

H(Xn|X1, · · · , Xn−1) ≤ H(Xn|X2, · · · , Xn−1) (1)≤ H(Xn−1|X1, · · · , Xn−2) (2)

– H(Xn|X1, · · · , Xn−1) decreases as n increases– H(X) ≥ 0– The limit must exist


• H(X) = H′(X), for Xi stationary

1n

H(X1, · · · , Xn) =1n

n∑i=1

H(Xi|Xi−1, · · · , X1)

• Each H(Xn|X1, · · · , Xn−1)→ H′(X)

• Cesaro mean:

If an → a, bn = 1n∑n

i=1 ai, bi → a, then bn → a.

• So1n

H(X1, · · · , Xn)→ H′(X)


AEP for stationary process

−1n

log p(X1, · · · , Xn)→ H(X)

• p(X1, · · · , Xn) ≈ 2−nH(X)

• Typical sequences in typical set of size 2−nH(X)

• We can use nH(X) bits to represent typical sequence


Entropy rate for Markov chain

• For Markov chain

H(X) = lim H(Xn|Xn−1, · · · , X1) = lim H(Xn|Xn−1) = H(X2|X1)

• By definitionp(X2 = j|X1 = i) = Pi j

• Entropy rate of Markov chain

H(X) = −∑

i j

µiPi j log Pi j


Calculate entropy rate is fairly easy

1. Find stationary distribution µi

2. Use transition probability Pi j

H(X) = −∑

i j

µiPi j log Pi j


Entropy rate of weather modelStationary distribution µ(S) = α

α+β, µ(R) = β

α+β

P =[1 − β βα 1 − α

]

H(X) = βα + β

[α logα + (1 − α) log(1 − α)]

=α

α + βH(β) +

β

α + βH(α)

≤ H(2αβ

α + β

)≤ H

( √αβ

)Maximum when α = β = 1/2: degenerate to independent process


Random walk on graph

• An undirected graph with m nodes {1, . . . ,m}

• Edge i→ j has weight Wi j ≥ 0 (Wi j = W ji)

• A particle walks randomly from node to node

• Random walk X1, X2, · · · : a sequence of vertices

• Given Xn = i, next step chosen from neighboring nodes with probability

Pi j =Wi j∑k Wik


Entropy rate of random walk on graph

• LetWi =

∑j

Wi j, W =∑

i, j:i> j

Wi j

• Stationary distribution isµi =

Wi2W

• Can verify this is a stationary distribution: µP = µ

• Stationary distribution ∝ weight of edges emanating from node i(locality)


Summary

• AEP Stationary process X1, X2, · · · , Xi ∈ X: as n→ ∞

p(x1, · · · , xn) ≈ 2−nH(X)

• Entropy rate

H(X) = limn→∞

H(Xn|Xn−1, . . . , X1) =1n

limn→∞

H(X1, . . . , Xn)

• Random walk on graphµi =

Wi2W


Documents

Lecture 6: Entropy Rateyxie77/ece587/Lecture6.pdf · Entropy rate of random walk on graph Let Wi = ∑ j Wij; W = ∑ i;j:i>j Wij Stationary distribution is i = Wi 2W Can verify this