Entropy properties
A.J. Han VinckUniversity of Duisburg-Essen
April 2013
content
• Introduction– Entropy and some related properties
• Source coding
• Channel coding
2entropy properties Han Vinck 2013
3
Field of Interest
It specifically encompasses theoretical and applied aspects of
- coding, communications and communications networks - complexity and cryptography- detection and estimation - learning, Shannon theory, and stochastic processes
Information theory deals with the problem of efficient and reliable transmission of information
entropy properties Han Vinck 2013
Communication model
4
source Analogue to digital
conversion
compression/reduction security error
protection
from bit to signal
digital
entropy properties Han Vinck 2013
A generator of messages: the discrete source
5
source X Output x { finite set of messages}
Example:
binary source: x { 0, 1 } with P( x = 0 ) = p; P( x = 1 ) = 1 - p
M-ary source: x {1,2, , M} with Pi =1.
entropy properties Han Vinck 2013
Express everything in bits: 0 and 1
6
Discrete finite ensemble:
a,b,c,d 00, 01, 10, 11
k binary digits specify 2k messages
M messages need log2M bits
entropy properties Han Vinck 2013
The entropy of a sourcea fundamental quantity in Information theory
7
entropyThe minimum average number of binary digits needed
to
specify a source output (message) uniquely is called
“SOURCE ENTROPY”
entropy properties Han Vinck 2013
SHANNON (1948):
8
1) Source entropy:= = L
2) minimum can be obtained !
QUESTION: how to represent a source output in digital form?
QUESTION: what is the source entropy of text, music, pictures?
QUESTION: are there algorithms that achieve this entropy?
http://www.youtube.com/watch?v=z7bVw7lMtUg
M
1i
M
1i2 )i()i(P)i(Plog)i(P
entropy properties Han Vinck 2013
Properties of entropy
9
A: For a source X with M different outputs: log2M H(X) 0
the „worst“ we can do is
just assign log2M bits to each source output
B: For a source X „related“ to a source Y: H(X) H(X|Y)
Y gives additional info about X
when X and Y are independent, H(X) = H(X|Y)
entropy properties Han Vinck 2013
Joint Entropy: H(X,Y) = H(X) + H(Y|X)
• also H(X,Y) = H(Y) + H(X|Y)
– intuition: first describe Y and then X given Y
– from this: H(X) – H(X|Y) = H(Y) – H(Y|X)
• Homework: check the formel
10entropy properties Han Vinck 2013
Cont.
)y|x(Plog)y,x(P)Y|X(H
)X|Y(H)X(H)x|y(Plog)y,x(P)x(Plog)x|y(P)x(P
)x|y(P)x(Plog)y,x(P)y,x(Plog)y,x(P)Y,X(H
Y,X
Y,XX Y
Y,XY,X
11
As a formel:
entropy properties Han Vinck 2013
Entropy: Proof of A
12
log2M = lnM log2eln x = y => x = ey
log2x = y log2e = ln x log2e
We use the following important inequalities
111 MnMM
Homework: draw the inequality
M-1 lnM
1-1/M
M
entropy properties Han Vinck 2013
Entropy: Proof of A
13
0)1M1(elog
)1)x(MP
1)(x(Pelog
x2
x2
x
22 )x(MP1ln)x(PelogMlog)X(H
entropy properties Han Vinck 2013
Entropy: Proof of B
14
0
)1)y|x(P
)x(P)(y,x(Pelog
)y|x(P)x(Pln)y,x(Pelog
)Y|X(H)X(H
y,x2
y,x2
entropy properties Han Vinck 2013
The connection between X and Y
15
X Y
P(X=0) Y = 0
P(X=1) Y = 1
••• •••
P(X=M-1) Y = N-1
P(Y=0|X=0)
P(Y=1|X=M-1)
P(Y=0|X=M-1)
P(Y= N-1|X=1)
P(Y= N-1|X=M-1)
P(Y= N-1|X=0)
1MX
0X
1MX
0X
j)Yi,P(X
i)X|ji)P(YP(Xj)P(Y
(Bayes)j)Y|ij)P(XP(Yi)X|ji)P(YP(Xj)Yi,P(X
entropy properties Han Vinck 2013
Entropy: corrolary
16
H(X,Y) = H(X) + H(Y|X)
= H(Y) + H(X|Y)
H(X,Y,Z) = H(X) + H(Y|X) + H(Z|XY)
H(X) + H(Y) + H(Z)
entropy properties Han Vinck 2013
Binary entropy
17
interpretation:
let a binary sequence contain pn ones, thenwe can specify each sequence with
log2 2nh(p) = n h(p) bits
( )2nh pnpn
)p(h
pnn
logn1lim 2n
Homework: Prove the approximation using ln N! ~ N lnN for N large.
Use also logax = y logb x = y logba
The Stirling approximation ! 2 N NN N N e entropy properties Han Vinck 2013
18
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
h
The Binary Entropy:
h(p) = -plog2p – (1-p) log2 (1-p)
Note:
h(p) = h(1-p)
entropy properties Han Vinck 2013
homework
• Consider the following figure
Y
0 1 2 3 X
All points are equally likely. Calculate H(X), H(X|Y) and H(X,Y)
19
3
2
1
entropy properties Han Vinck 2013
mutual information I(X;Y):=
20
I(X;Y) := H(X) – H(X|Y)
= H(Y) – H(Y|X) ( homework: show this! )
i.e. the reduction in the description length of X given Y
note that I(X;Y) 0
or: the amount of information that Y gives about X
equivalently: I(X;Y|Z) = H(X|Z) – H(X|YZ)
the amount of information that Y gives about X given Z
entropy properties Han Vinck 2013
3 classical channels
21
Binary symmetric erasure Z-channel
(satellite) (network) (optical)
0
X
1
0
X
1
0
X
1
0
E
1
0
Y
1
0
Y
1
Homework:
find maximum H(X)-H(X|Y) and the corresponding input distribution entropy properties Han Vinck 2013
Example 1
• Suppose that X Є { 000, 001, , 111 } with H(X) = 3 bits
• Channel: X Y = parity of X
channel
• H(X|Y) = 2 bits: we transmitted H(X) – H(X|Y) = 1 bit of information!
We know that X|Y Є { 000, 011, 101, 110 } or X|Y Є { 001, 010, 001, 111 }
Homework: suppose the channel output gives the number of ones in X.What is then H(X) – H(X|Y)?
22entropy properties Han Vinck 2013
Transmission efficiency
I need on the average H(X) bits/source output to describe the source symbols X
After observing Y, I need H(X|Y) bits/source output
H(X) H(X|Y)
Reduction in description length is called the transmitted information
Transmitted R = H(X) - H(X|Y)= H(Y) – H(Y|X) from earlier calculations
We can maximize R by changing the input probabilities. The maximum is called CAPACITY (Shannon 1948)
23
channelX Y
entropy properties Han Vinck 2013
Transmission efficiency
• Shannon shows that error correcting codes exist that have– An efficieny k/n Capacity
• n channel uses for k information symbols– Decoding error probability 0
• when n very large
• Problem: how to find these codes
24entropy properties Han Vinck 2013
channel capacity: the BSC
1-p
0 0
p
1 1
1-p
X Y
I(X;Y) = H(Y) – H(Y|X)
the maximum of H(Y) = 1
since Y is binary
H(Y|X) = h(p)
= P(X=0)h(p) + P(X=1)h(p)
Conclusion: the capacity for the BSC CBSC = 1- h(p)Homework: draw CBSC , what happens for p > ½
entropy properties Han Vinck 2013 25
Practical communication system design
messageestimate
channel decoder
n
Code word in
receive
There are 2k code words of length nk is the number of information bits transmitted in n channel uses
2k
Code book
Code bookwith errors
entropy properties Han Vinck 2013 26
27
A look at the binary symmetric channel
x є {0,1}n
e є {0,1}; probability(e = 1) = p
y є {0,1}n
Conclusion:if more than one x is connected with y, we have to accept a decision error
N=2k equiprobable messages
m1m2
mNlook for the xat distance ~ np
x‘ m‘
m‘
Encoding and decoding according to Shannon
Code: 2k binary codewords where p(0) = P(1) = ½
Channel errors: P(0 1) = P(1 0) = p
i.e. # error sequences 2nh(p)
Decoder: search around received sequence for codeword
with np differences
space of 2n binary sequences
entropy properties Han Vinck 2013 28
29
1. For t errors: Prob(|t/n-p|> є) 0
for n ( law of large numbers)
2. Expected # of code word in region(codewords are random)
Illustration of the decoding error probabilitycodeword
received
x vectors possible 2: 2 2
22 C]n[Rh(p)](1nkn[
n
nh(p)k
Conclusion: probability of a correct guess of x is 2-n(R-C) => 0 for R > C
30
conclusion
For R > C the decoding error probability > 0
R = k/nC
Pe
channel capacity: the BSC
0.5 1.0
1.0
Bit error p
Cha
nnel
cap
acity
Explain the behaviour!
entropy properties Han Vinck 2013 31
Recommended