Chapter 9 Gaussian Channelweb.ntpu.edu.tw/~phwang/teaching/2012s/IT/slides/chap09.pdf · Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 23/31 Capacity of Bandlimited

Chapter 9

Gaussian Channel

Peng-Hua Wang

Graduate Inst. of Comm. Engineering

National Taipei University

http://web.ntpu.edu.tw/~phwang

http://www.ntpu.edu.tw/ce/Default.htm

http://www.ntpu.edu.tw

Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 2/31

Chapter Outline

Chap. 9 Gaussian Channel

9.1 Gaussian Channel: Definitions

9.2 Converse to the Coding Theorem for Gaussian Channels

9.3 Bandlimited Channels

9.4 Parallel Gaussian Channels

9.5 Channels with Colored Gaussian Noise

9.6 Gaussian Channels with Feedback


9.1 Gaussian Channel: Definitions


Introduction

Yi = Xi + Zi, Zi ∼ N(0, N)

n Xi: input, Yi:output, Zi: noise. Zi is independent of Xi.

n Without further constraint, the capacity of this channel may be infinite.

u If the noise variance N is zero, the channel can transmit an

arbitrary real number with no error.

u If the noise variance N is nonzero, we can choose an infinite

subset of inputs arbitrary far apart, so that they are distinguishable

at the output with arbitrarily small probability of error.


Introduction

n The most common limitation on the input is an energy or power

constraint.

n We assume an average power constraint. For any codeword

(x1, x2, . . . , xn) transmitted over the channel, we require that

1

n

n∑

i=1

x2i ≤ P


Information Capacity

Definition 1 (Capacity) The information capacity of the Gaussian

channel with power P is

C = maxf(x):E[X2]≤P

I(X;Y )

We can calculate the information capacity as follows.

I(X;Y ) = h(Y )− h(Y |X) = h(Y )− h(X + Z|X)

= h(Y )− h(Z|X) = h(Y )− h(Z)

≤ 1

2log 2πe(P +N)− 1

2log 2πeN

=1

2log

(

1 +P

N

)

Note that E[Y 2] = E[(X + Z)2] = P +N and the entropy of

gaussian with variance σ2 is12log 2πeσ2.


Information Capacity

Therefore, the information capacity of the Gaussian channel is

C = maxE[X2]≤P

I(X;Y ) =1

2log

(

1 +P

N

)

and the equality holds when X ∼ N(0, P ).

n Next, we will show that this capacity is achievable.


Code for Gaussian Channel

Definition 2 ((M,n) code for Gaussian Channel) An (M,n) code

for the Gaussian channel with power constraint P consists the following:

1. An index set {1, 2, . . . ,M}.2. An encoding function x : {1, 2, . . . ,M} → X n, yielding

codewords xn(1), xn(2), . . . , xn(M), satisfying the power

constraint P

1

n

n∑

i=1

x2i (w) ≤ P, w = 1, 2, . . . ,M.

3. A decoding function g : Yn → {1, 2, . . . ,M}.


Definitions

Definition 3 (Conditional probability of error)

λi = Pr(g(Y n) 6= i|Xn = xn(i)) =∑

g(yn) 6=i

p(yn|xn(i))

=∑

yn

p(yn|xn(i))I(g(yn) 6= i)

n I(·) is the indicator function.


Definitions

Definition 4 (Maximal probability of error)

λ(n) = maxi∈{1,2,...,M}

λi

Definition 5 (Average probability of error)

P (n)e =

1

M

M∑

i=1

λi

n The decoding error is

Pr(g(Y n) 6= W ) =M∑

i=1

Pr(W = i) Pr(g(Y n) 6= i|W = i)

If the index W is chosen uniformly from {1, 2, . . . ,M}, then

P(n)e = Pr(g(Y n) 6= W ).


Definitions

Definition 6 (Rate) The rate R of an (M,n) code is

R =logM

nbits per transmission

Definition 7 (Achievable rate) A rate R is said to be achievable for a

Gaussian channel with a power constraint P if there exists a

(⌈2nR⌉, n) code with codewords satisfying the power constraint such

that the maximal probability of error λ(n) tends to 0 as n → ∞.

Definition 8 (Channel capacity) The capacity of a channel is the

supremum of all achievable rates.


Capacity of a Gaussian Channel

Theorem 1 (Capacity of a Gaussian Channel) The capacity of a

Gaussian channel with power constraint P and noise variance N is

1

2log

(

1 +P

N

)

bits per transmission.


Sphere Packing Argument


Sphere Packing Argument

For each sent codeword, the received codeword is contained in a

sphere of radius√nN . The received vectors have energy no grater

than n(P +N), so they lie in a sphere of radius√

n(P +N). How

many codeword can we use without intersection in the decoding

sphere?

M =An

(

√

n(P +N))n

An(√nN)n

=

(

1 +P

N

)n/2

where A the constant for calculating the volume of n-dimensional sphere. For example,

A2 = π, A3 = 43π. Therefore, the capacity is

1

nlogM =

1

2log

(

1 +P

N

)

.


R < C → Achievable

n Codebook. Let Xi(w), i = 1, 2, . . . , n, w = 1, 2, . . . , 2nR be

i.i.d. ∼ N (0, P − ǫ). For large n,

1

n

∑

X2i → P − ǫ.

n Encoding. The codebook is revealed to both the sender and the

receiver. To send the message index w, the transmitter sends the

wth codeword Xn(w) in the codebook.

n Decoding. The receiver searches for the one that is jointly typical

with the received vector. If there is one and only one such codeword

Xn(w), the receiver declares W = w. Otherwise, the receiver

declares an error. If the power constraint is not satisfied, the receiver

also declare an error.



n Probability of error. Assume that codeword 1 was sent.

Y n = Xn(1) + Zn. Define the events

E0 =

{

1

n

n∑

j=1

X2j (1) > P

}

and

Ei = {(

Xn(i), Y n(i) is in A(n)ǫ

)

}.Then an error occurs if

u The power constraint is violate. ⇒ E0 occurs.

u The transmitted codeword and the received sequence are not

jointly typical. ⇒ Ec1 occurs.

u Wrong codeword is jointly typical with the received sequence. ⇒E2 ∪ E3 ∪ · · · ∪ E2nR occurs.



Let W be uniformly distributed. We have

P (n)e =

1

2nR

∑

λi = P (E) = Pr(E|W = 1)

= P (E0 ∪ Eca ∪ E2 ∪ E3 ∪ · · · ∪ E2nR)

≤ P (E0) + P(Ec1) +

2nR∑

i=2

P (Ei)

≤ ǫ+ ǫ+

2nR∑

i=2

2−n(I(X;Y )−3ǫ)

≤ 2ǫ+ 2−n(I(X;Y )−R−3ǫ) ≤ 3ǫ

for n sufficient large and R < I(X;Y )− 3ǫ.


R < C → Achievable, final part

n Since the average probability of error over codebooks is less then 3ǫ,

there exists at least one codebook C∗ such that Pr(E|C∗) < 3ǫ.

u C∗ can be found by an exhaustive search over all codes.

n Deleting the worst half of the codewords in C∗, we obtain a code with

low maximal probability of error. The codewords that violates the

power constraint is definitely deleted. (why?) Hence, we have

construct a code that achieves a rate arbitrarily close to C .


9.2 Converse to the Coding Theorem forGaussian Channels


Achievable → R < C

We will prove that if P(n)e → 0 then R ≤ C = 1

2log(1 + P

N). Let W

be distributed uniformly. We have W → Xn → Y n → W . By Fano’s

inequality,

H(W |W ) ≤ 1 + nRP (n)e = nǫn, where ǫn =

1

n+RP (n)

e → 0

as P(n)e → 0. Now,

nR = H(W ) = I(W ; W ) +H(W |W )

≤ I(W ; W ) + nǫn ≤ I(Xn;Y n) + nǫn(data processing ineq.)

= h(Y n)− h(Y n|Xn) + nǫn = h(Y n)− h(Zn) + nǫn

≤n∑

i=1

h(Yi)− h(Zn) + nǫn ≤n∑

i=1

h(Yi)−n∑

i=1

h(Zi) + nǫn


Achievable → R < C

nR ≤n

∑

i=1

(h(Yi)− h(Zi)) + nǫn

≤∑

(

1

2log (2πe(Pi +N))− 1

2log 2πeN

)

+ nǫn

=∑ 1

2log

(

1 +Pi

N

)

+ nǫn

≤ n

2log

(

1 +P

N

)

+ nǫn

since every codeword satisfies the power constraint. Thus,

R ≤ 1

2log

(

1 +P

N

)

+ ǫn.


9.3 Bandlimited Channels


Capacity of Bandlimited Channels

n Suppose the output of a band-limited channel can be represented by

Y (t) = (X(t) +N(t)) ∗ h(t)

where X(t) is the input signal, Z(t) is the white Gaussian noise,

and h(t) is the impulse response of the channel with bandwidth W .

n The sampling frequency is 2W. If the channel be used over the time

interval [0, T ], then there are 2WT samples transmitted.



n If the noise has power spectral density N0/2 watts/Hz, the noise

power is (N0/2)(2W ) = N0W. The noise energy per sample is

N0W ∗ T/2WT = N0/2. If the signal power is P . The signal

energy per sample is PT/2WT = P/2W.

n The capacity is 12log

(

1 + P/2WN0/2

)

bits/sample or

C = W log

(

1 +P

N0W

)

bits/second


9.4 Parallel Gaussian Channels



n In this section we consider k independent Gaussian channels in

parallel with a common power constraint. The objective is to distribute

the total power among the channels so as to maximize the capacity.

The channels are modeled as

Yj = Xj + Zj , j = 1, 2, . . . , k.

with Zj ∼ N (0, Nj). There is a common power constraint

E

[

k∑

j=1

X2j

]

≤ P.



The information capacity is

C = maxf(X−1,...,xn):EX2

i <PI(X1, X2 . . . , Xk;Y1, Y2, . . . , Yk)

Since Z1, Z2, . . . , Zk are independent,

I(X1, X2 . . . , Xk;Y1, Y2, . . . , Yk)

=h(Y1, Y2, . . . , Yk)− h(Y1, Y2, . . . , Yk|X1,X2 . . . ,Xk)

=h(Y1, Y2, . . . , Yk)− h(Z1, Z2, . . . , Zk|X1, X2 . . . , Xk)

=h(Y1, Y2, . . . , Yk)− h(Z1, Z2, . . . , Zk)

=h(Y1, Y2, . . . , Yk)−∑

i

h(Zi)

≤∑

i

h(Yi)−∑

i

h(Zi) ≤∑

i

1

2log

(

1 +Pi

Ni

)

where Pi = EX2i and

∑

Pi = P



Therefore, we have a constrained optimization problem

max∑

i

1

2log

(

1 +Pi

Ni

)

subject to∑

i

Pi ≤ P, Pi ≥ 0.

This can be solved by Lagrange multiplier together with the Kuhn-Tucker

condition.

− 1

2

1/Ni

1 + Pi/Ni

− µi + λ = 0

− Pi ≤ 0,∑

i

Pi − P ≤ 0

µiPi = 0, λ(∑

i

Pi − P ) = 0

µi ≥ 0, λ ≥ 0



Case I. λ = 0. We have

Pi +Ni = − 1

2µi, Pi = − 1

2µi−Ni

This violates the condition −Pi ≤ 0 since Ni > 0 and µi ≥ 0.

Case II. λ 6= 0. We have

Pi +Ni =1

2(λ− µi)=

12λ

= constant, Pi > 0( imply µi = 0)

12(λ−µi)

, Pi = 0.

We can solve λ by∑

i Pi =∑

i(12λ

−Ni)+ = P




Nonlinear Optimization

For the problem

min f(x1, x2, . . . , xn)

subject to

gj(x1, x2, . . . , xn) ≤ 0, j = 1, 2, . . . m

The necessary conditions for optimization are

∂f

∂xi

+∑

j

µj∂gj∂xi

= 0, i = 1, 2, . . . , n

gj(x1, x2, . . . , xn) ≤ 0, j = 1, 2, . . . ,m

µjgj(x1, x2, . . . , xn) = 0, j = 1, 2, . . . ,m

µj ≥ 0, j = 1, 2, . . . ,m

Documents

Chapter 9 Gaussian Channelweb.ntpu.edu.tw/~phwang/teaching/2012s/IT/slides/chap09.pdf · Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 9 - p. 23/31 Capacity of Bandlimited