16
Information Theory Chapter 4: Information Channels and Capacity Rudolf Mathar WS 2018/19

Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Information TheoryChapter 4: Information Channels and Capacity

Rudolf Mathar

WS 2018/19

Page 2: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Outline Chapter 4: Information Channels

Discrete Channel Model

Channel Capacity

Binary ChannelsBinary Symmetric Channel (BSC)Binary Asymmetric Channel (BAC)Binary Z-Channel (BZC)Binary Asymmetric Erasure Channel (BAEC)

Channel Coding

Decoding Rules

Error Probabilities

The Noisy Coding Theorem

Converse of the Noisy Coding Theorem

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 2

Page 3: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Communication Channelfrom an information theoretic point of view

noise

estimation

modulator

source

source encoder

channel encoder

destination

source decoder

channel decoder

demodulator

channel

random

channel

analog channel

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 3

Page 4: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Discrete Channel ModelDiscrete information channels are described by

� A pair of random variables

(X ,Y ) with support X × Y,

X is the input r.v., X = {x1, . . . , xm} the input alphabet.Y is the outputr.v., Y = {y1, . . . , yd} the output alphabet.

� The channel matrix

W =�wij

�i=1,...,m, j=1,...,d

with

wij = P(Y = yj | X = xi�, i = 1, . . . ,m, j = 1, . . . , d

� Input distribution

P(X = xi ) = pi , i = 1, . . . ,m,

p = (p1, . . . , pm).

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 4

Page 5: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Discrete Channel Model

W =�wij

�1≤i≤m,1≤j≤r

xi yj

Input X Channel W Output Y

Write W composed of rows w1, . . . ,wm as W =

w1

w2

...wm

Lemma 4.1

H(Y ) = H(pW )

H(Y | X = xi ) = H(wi )

H(Y | X ) =m�

i=1

piH(wi )

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 5

Page 6: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Channel CapacityMutual information

I (X ;Y ) = H(Y ) − H(Y | X )

= H(pW )−m�

i=1

piH(wi )

=m�

i=1

piD�wi � pW

�= I (p;W ),

D denoting the Kulback-Leibler divergence.

Aim: For a given channel W use the input distribution that maximizesmutual information I (X ;Y ).

Definition 4.2.

C = max(p1,...,pm)

I (X ;Y ) = maxp

I (p,W )

is called channel capacity.

Determining capacity is in general a complicated optimization problem.

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 6

Page 7: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Binary Symmetric Channel (BSC)Example 4.3.

1 1

0 0ε

ε

1− ε

1− ε

X Y

Input distribution p = (p0, p1)

Channel matrix

W =

�1− ε εε 1− ε

Mutual Information:

I (X ;Y ) = I (p;W ) = H(pW ) −m�

i=1

piH(wi )

=� �� �H�p0(1− ε) + p1ε, εp0 + (1− ε)p1

�−� �� �H�ε, 1− ε

The maximum of I (p,W ) over all input distributions (p0, p1) is achievedat

(p∗0 , p∗1 ) = (0.5, 0.5)

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 7

Page 8: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Binary Symmetric Channel (BSC)Hence, p∗0 = p∗1 = 1

2 is capacity-achieving. It holds

C = max(p0,p1)

I (X ;Y ) = 1 + (1− ε) log2(1− ε) + ε log2 ε

Capacity of the BSC as a function of ε:

ε

C=C( )ε

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 8

Page 9: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Channel Capacity (ctd.)

Given a channel with channel matrix W . To compute channel capacitysolve

C = maxp

I (p;W ) = maxp

m�

i=1

piD�wi � pW

Theorem 4.4.The capacity of the channel W is attained at p∗ = (p∗1 , . . . , p

∗m) if and

only ifD(wi � p∗W ) = ζ for all i = 1, . . . ,m.

for all i = 1, . . . ,m with pi > 0.Moreover,

C = I (p∗;W ) = ζ.

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 9

Page 10: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Channel Capacity (ctd.)Proof of the Theorem:

Mutual information I (p;W ) is a concave function of p. Hence the KKTconditions (cf., e.g., Boyd and Vandenberge 2004) are necessary andsufficient for optimality of some input distribution p. Using the aboverepresentation some elementary algebra shows that

∂piI (p;W ) = D(wi�pW )− 1.

The full set of KKT conditions now reads as

�mj=1pj = 1

pi ≥ 0, i = 1, . . . ,m

λi ≥ 0, i = 1, . . . ,m

λipi = 0, i = 1, . . . ,m

D(wi�pW ) + λi + ν = 0, i = 1, . . . ,m

which shows the assertion.

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 10

Page 11: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Channel Capacity (ctd.)

Denote self information by ρ(q) = −q log q, q ≥ 0.

Theorem 4.5. (G. Alirezaei, 2018)Given a channel with square channel matrix W =

�wij

�i,j=1,...,m

. Assume

that W is invertible with inverse

T =�tij�i,j=1,...,m

.

Then, measured in nats, the capacity is

C = ln��

k

exp�−

i,j

tki ρ(wij)��

and the capacity achieving distribution is given by

p∗� = e−C�

k

tks exp�−�

i,j

tki ρ(wij)�=

�k tks exp

�−�

i,j tki ρ(wij)�

�k exp

�−�

i,j tki ρ(wij)�

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 11

Page 12: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Binary Asymmetric Channel (BAC)Example 4.6.

1 1

0 0ε

δ

1− δ

1− ε

W =

�1− ε εδ 1− δ

The capacity-achieving distribution is

p∗0 =1

1 + b, p∗1 =

b

1 + b,

with

b =aε− (1− ε)

δ − a(1− δ)and a = exp

�h(δ)− h(ε)

1− ε− δ

�,

and h(ε) = H(ε, 1− ε), the entropy of (ε, 1− ε).

Note that ε = δ yields the previous result for the BSC.

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 12

Page 13: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Binary Asymmetric Channel (BAC)

Derivation of capacity for the BAC:

By Theorem 4.4 the capacity-achieving input distribution p = (p0, p1)satisfies

D(w1�pW ) = D(w2�pW ).

This is an equation in the variables p0, p1 which jointly with the conditionp0 + p1 = 1 has the solution

p∗0 =1

1 + b, p∗1 =

b

1 + b, (1)

with

b =aε− (1− ε)

δ − a(1− δ)and a = exp

�h(δ)− h(ε)

1− ε− δ

�,

and h(ε) = H(ε, 1− ε), the entropy of (ε, 1− ε).

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 13

Page 14: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Binary Z-Channel (BZC)

Example 4.7.The so called Z-channel is a special case of the BAC with ε = 0.

1 1

0 0

δ

1− δ

1

The capacity-achieving distribution is obtained from the BAC by settingε = 0.

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 14

Page 15: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Binary Asymmetric Erasure Channel (BAEC)

Example 4.8.

0 01− ε

1 1

e

1− δ

ε

δ

W =

�1− ε ε 00 δ 1− δ

The capacity-achieving distribution is determined by finding the solutionx∗ of

ε log ε− δ log δ = (1− δ) log(δ + εx)− (1− ε) log(ε+ δ/x)

and settingp∗0p∗1

= x∗, p∗0 + p∗1 = 1.

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 15

Page 16: Information Theory - ti.rwth-aachen.de fileDiscrete Channel Model Discrete information channels are described by A pair of random variables (X,Y) with support X ×Y, X is the input

Binary Asymmetric Erasure Channel (BAEC)

Derivation of capacity for the BAEC:

By Theorem 4.4 the capacity-achieving distribution p∗ = (p∗0 , p∗1 ),

p∗0 + p∗1 = 1 is given by the solution of

(1− ε) log1− ε

p0(1− ε)+ ε log

ε

p0ε+ p1δ

= δ logδ

p0ε+ p1δ+ (1− δ) log

1− δ

p0(1− δ),

(2)

Substituting x = p0p1, equation (2) reads equivalently as

ε log ε− δ log δ = (1− δ) log(δ + εx)− (1− ε) log(ε+ δ/x)

By differentiating w.r.t. x it is easy to see that the right hand side ismonotonically increasing such that exactly one solution p∗ = (p∗1 , p

∗2 )

exists, which can be numerically computed.

Rudolf Mathar, Information Theory, RWTH Aachen, WS 2018/19 16