On Error Exponents of Modulo Lattice Additive Noise Channelsmoulin/Papers/LiuMK05.pdf1 Introduction Consider Costa’s dirty-paper channel [1] Y = X+S+N (1) where the channel input

On Error Exponents ofModulo Lattice Additive Noise Channels

Tie Liu, Pierre Moulin, and Ralf Koetter ∗

May 28, 2004, Revised July 1 and October 21, 2005

Abstract

Modulo lattice additive noise (MLAN) channels appear in the analysis of structuredbinning codes for Costa’s dirty-paper channel and of nested lattice codes for the additivewhite Gaussian noise (AWGN) channel. In this paper, we derive a new lower bound onthe error exponents of the MLAN channel. With a proper choice of the shaping latticeand the scaling parameter, the new lower bound coincides with the random-codinglower bound on the error exponents of the AWGN channel at the same signal-to-noiseratio (SNR) in the sphere-packing and straight-line regions. This result implies that,at least for rates close to channel capacity, (1) writing on dirty paper is as reliable aswriting on clean paper; and (2) lattice encoding and decoding suffer no loss of errorexponents relative to the optimal codes (with maximum-likelihood decoding) for theAWGN channel.

Keywords: Additive white Gaussian noise channel, Costa’s dirty-paper channel, errorexponents, lattice decoding, modulo lattice additive noise channel, nested lattice codes

∗This work was supported by NSF under ITR grants CCR 00-81268 and CCR 03-25924. The mate-rial in this paper was presented in part at the IEEE Information Theory Workshop, San Antonio, TX,October 2004. The authors are with the Department of Electrical and Computer Engineering at the Univer-sity of Illinois at Urbana-Champaign, Urbana, IL 61801, USA (e-mail: {tieliu, moulin}@ifp.uiuc.edu;[email protected]).

1

1 Introduction

Consider Costa’s dirty-paper channel [1]

Y = X + S + N (1)

where the channel input X = (X1, · · · , Xn) satisfies the average power constraint

E

[1

n

n∑i=1

X2i

]≤ PX , (2)

the interference S ∼ N (0, PSIn) and the noise N ∼ N (0, PNIn) are length-n white Gaussianvectors with zero mean and power PS and PN respectively (In is the n× n identity matrix),and S and N are statistically independent. The interference S is noncausally known to thetransmitter, and this knowledge can be used to encode the message over the entire block.On the other hand, only the statistics of S are known to the receiver.

The capacity of Costa’s dirty-paper channel cannot exceed that obtained when S is alsoknown to the receiver. In the latter case, (1) reduces to the zero-interference AWGN channel

Y = X + N (3)

under the same average power constraint (2). Costa [1] proved the amazing result that thecapacity of the dirty-paper channel (1) actually coincides with that of the zero-interferenceAWGN channel (3). That is, the lack of knowledge of the interference at the receiver doesnot cause any loss in channel capacity. A natural question to ask next is whether this lackof knowledge causes any degradation in channel reliability.

Costa’s result [1] followed from a direct evaluation of a capacity formula for genericGel’fand-Pinsker channels [2] and was based on a random binning argument. Zamir, Shamaiand Erez [3] proposed a structured binning scheme which they showed is also capacity-achieving for Costa’s dirty-paper channel. The main idea there is to use a scaled-latticestrategy to transform Costa’s dirty-paper channel (1) into a modulo lattice additive noise(MLAN) channel and show that the capacity of the induced MLAN channel is asymptoticallythe same as that of the zero-interference AWGN channel (3). Interestingly, using essentiallythe same idea, Erez and Zamir [3, 4, 5, 6] cracked the long-standing problem of achievingcapacity of AWGN channels with lattice encoding and decoding.

Furthermore, Erez and Zamir [5, 6] studied the error exponents of the MLAN channel andshowed that they are lower-bounded by the Poltyrev exponents which were previously derivedin the context of coding for the unconstrained AWGN channel [7]. However, the Poltyrev

2

exponent is strictly inferior to the random-coding lower bound on the error exponents of theAWGN channel at the same SNR for all rates below channel capacity. On the other hand,the random-coding lower bound on the error exponents of the AWGN channel is known tobe tight in the sphere-packing region [8], [9, p. 338]. Therefore, determining the reliabilityfunction of the MLAN channel (and hence of Costa’s dirty-paper channel) remained an openproblem.

In this paper, we derive a new lower bound on the error exponents of the MLAN channel.By a data-processing argument, it is also a lower bound on the error exponents of Costa’sdirty-paper channel. With a proper choice of the shaping lattice and the scaling parameter,the new lower bound coincides with the random-coding lower bound on the error exponentsof the AWGN channel at the same SNR in the sphere-packing and straight-line regions.Therefore, at least for rates close to channel capacity, writing on dirty paper is as reliable aswriting on clean paper.

Before aliasing, the effective noise in a MLAN channel is not strictly Gaussian but ratherapproaches a Gaussian distribution as the dimension of the shaping lattice tends to infinity(when the shaping lattice is appropriately chosen). As illustrated by Forney [10], this van-ishing “non-Gaussianity” does not affect the channel capacity. It does, however, impact theerror exponents because channel reliability is known to be determined by the large devia-tion of the channel law rather than by its limiting behavior. It turns out that this fact hasimportant consequences for the optimal choice of the lattice-scaling parameter α. Selectingα according to the minimum mean-square error (MMSE) principle (this choice of α is thendenoted by αMMSE) is asymptotically optimal for reliable communication at the capacitylimit. However, αMMSE is strictly suboptimal in maximizing the new lower bound (on theerror exponents) for all rates below channel capacity. The best error exponents are achievedby using lattice-scaling parameters determined by large-deviation analysis.

The rest of the paper is organized as follows. In Section 2, we formalize the transformationfrom Costa’s dirty-paper channel to the MLAN channel and summarize the known results onthe capacity and error exponents of the MLAN channel. In Section 3, we derive a new lowerbound on the error exponents of the MLAN channel. In Section 4, we give some numericalexamples to illustrate the new results. In Section 5, we extend our results to the AWGNchannel with lattice encoding and decoding. Finally, we give some concluding remarks inSection 6.

3

2 The (Λ, α)-MLAN Channel

We first recall some notation and results from lattice theory. An n-dimensional real latticeΛ is a discrete additive subgroup of Rn defined as Λ = {uG : u ∈ Zn} where G is an n× nfull-rank generator matrix and u is a vector with integer components. (All vectors in thispaper are row vectors.) A basic Voronoi region V0(Λ) is the set of points x ∈ Rn closer to 0than to any other point in Λ, i.e.,

V0(Λ) def= {x : ‖x‖ ≤ ‖x− λ‖, ∀λ ∈ Λ} (4)

where ties can be broken in any systematic fashion such that V0(Λ) includes one and onlyone representative from each coset of Λ in Rn. The second moment per dimension associatedwith Λ is defined as

σ2(Λ) def=

1

nV (Λ)

∫

V0(Λ)

‖x‖2dx. (5)

Here, V (Λ) def= Vol(Vo(Λ)) is the volume of V0(Λ), and ‖x‖ denotes the Euclidean norm of x.

The normalized second moment is defined as

G(Λ) def= V (Λ)−2/nσ2(Λ) ≥ (2πe)−1 (6)

where (2πe)−1 is the normalized second moment of an n-dimensional ball as n →∞.

The covering radius rcov(Λ) is the radius of the smallest n-dimensional ball centered atthe origin that contains V0(Λ). The effective radius reff(Λ) is the radius of an n-dimensionalball whose volume is equal to V (Λ). A lattice Λ is good for covering if the covering efficiency

ρcov(Λ) def=

rcov(Λ)

reff(Λ)→ 1 (7)

and is good for mean-square error quantization if G(Λ) → (2πe)−1 for sufficiently-largedimension of Λ. Following a result of Rogers, Zamir and Feder [11] showed that there existlattices which are simultaneously good for covering and mean-square error quantization.They referred to such lattices as Rogers-good.

The direct product of two lattices Λ1 and Λ2 is defined as

Λ1 × Λ2 = {(λ1,λ2) : λ1 ∈ Λ1,λ2 ∈ Λ2} , (8)

which results in a new lattice with basic Voronoi region V0(Λ1)×V0(Λ2) and covering radius√

r2cov(Λ1) + r2

cov(Λ2). (9)

It can be shown that the covering efficiency of the k-fold direct product lattice Λk satisfies

limk→∞

ln ρcov(Λk) = ln ρcov(Λ) +

1

2ln (2πeG∗

n) +1

2ln

(1 +

2

n

)(10)

where G∗n denotes the normalized second moment of an n-dimensional ball.

4

-

-

U

mod Λ

N

s

mod Λx yv

αs α

y′

Figure 1: MLAN channel transformation of Costa’s dirty-paper channel Y = X + S + N.

2.1 MLAN Channel Transformation

Let Λ be an n-dimensional lattice with second moment per dimension PX and let QΛ(·) bethe corresponding (Euclidean) lattice quantizer. Let U ∼ Unif(V0(Λ)) be a dither vectoruniformly distributed over V0(Λ). Referring to Figure 1, consider the following modulo-Λtransmission scheme for Costa’s dirty-paper channel (1).

• Transmitter: For any v ∈ V0(Λ), the transmitter sends

x = [v − αs− u] mod Λ (11)

where α ∈ [0, 1] is a scaling parameter, and x mod Λ def= x−QΛ(x).

• Receiver: The receiver computes

y′ = [αy + u] mod Λ. (12)

Due to the uniform distribution of the dither vector U, for any v, the channel input Xis also uniformly distributed over V0(Λ) [6, Lemma 1], [10, Lemma 2]. Thus, the averagetransmitted power is PX and the input constraint (2) is satisfied. The resulting channel isthe (Λ, α)-MLAN channel defined below.

Lemma 1 ([4, 5, 6]) The channel defined by (1), (11) and (12) is equivalent in distributionto the (Λ, α)-MLAN channel

y′ = [v + N′eff ] mod Λ (13)

withN′

eff = [(1− α)U + αN] mod Λ. (14)

5

2.2 Summary of Known Results

The capacity-achieving distribution for the (Λ, α)-MLAN channel (13) is V ∼ Unif(V0(Λ)).Define

C(SNR, α) def=

1

2ln

SNR

(1− α)2SNR + α2(15)

where SNR def= PX/PN . The channel capacity (in nats per dimension) is given by

CΛ(SNR, α) =1

nI(V;Y′) (16)

=1

nh(Y′)− 1

nh(N′

eff) (17)

=1

2ln

PX

G(Λ)− 1

nh(N′

eff) (18)

≥ 1

2ln

PX

G(Λ)− 1

nh(Neff) (19)

≥ 1

2ln

PX

E[

1n‖Neff‖2

] − 1

2ln (2πeG(Λ)) (20)

= C(SNR, α)− 1

2ln (2πeG(Λ)) (21)

where Neffdef= (1− α)U + αN is the effective noise of the (Λ, α)-MLAN channel (13) before

the “mod Λ” operation (the aliasing). Here, (18) follows from the uniform distribution ofY′ over V0(Λ); (19) follows from h(N′

eff) ≤ h(Neff) because the many-to-one mapping “modΛ” can only reduce the (differential) entropy; (20) follows from the fact that the entropy ofNeff is upper bounded by that of a white Gaussian vector with the same second moment [9,p. 372]; and (21) follows from the defition of C(SNR, α) in (15).

Next, α may be chosen to maximize C(SNR, α). The maximizing α is αMMSEdef=

SNR1+SNR

,and the corresponding value of C(SNR, α) is the zero-interference AWGN capacity

CAWGN(SNR) def=

1

2ln (1 + SNR) . (22)

Finally, choosing Λ to be Rogers-good so that G(Λ) ↓ (2πe)−1 as n → ∞, we obtainCΛ(SNR, α) ↑ CAWGN(SNR). We conclude that the capacity of the MLAN channel asymp-totically approaches that of the AWGN channel at the same SNR, in the limit as the latticedimension n tends to infinity.

An estimation-theoretic explanation for the choice α = αMMSE was given by Forney [10].Note that, with this choice of α, the effective noise N′

eff in the (Λ, α)-MLAN channel (13)

6

involves a uniform component (1 − α)U and hence is not strictly Gaussian before aliasing.The lower bound on the right side of (20), on the other hand, is asymptotically tight because,when the shaping lattice Λ is chosen to be Rogers-good, the dither vector U uniformlydistributed over V0(Λ) approaches in entropy rate a white Gaussian vector with the samesecond moment [11].

Erez and Zamir [5, 6] also studied the error exponents of the MLAN channel. Theyshowed that the error exponent EΛ(R; SNR, α) of the (Λ, α)-MLAN channel (13) satisfies

EΛ(R; SNR, α) ≥ EP

(e2(C(SNR,α)−R−ζ2(Λ))

)− ζ1(Λ) (23)

where EP (·) is the Poltyrev exponent given by

2EP (µ) =

µ− ln(eµ), 1 ≤ µ ≤ 2ln eµ

4, 2 ≤ µ ≤ 4

µ4, µ ≥ 4,

(24)

and ζ1(Λ) and ζ2(Λ) are defined as

ζ1(Λ) def= ln ρcov(Λ) +

1

2ln(2πeG∗

n) +1

n(25)

and

ζ2(Λ) def= ln ρcov(Λ) +

1

2ln(2πeG(Λ)). (26)

This succinct parametrization (24) of the Poltyrev exponent is due to Forney, Trott andChung [12] 1. The parameter µ represents the “volume-to-noise ratio” of the channel. (Thefactor of 2 before EP (µ) shows that everything should really be measured per two realdimensions.)

The error exponent ECosta(R; SNR) of Costa’s dirty-paper channel (1) satisfies

ECosta(R; SNR) ≥ supα

supΛ

EΛ(R; SNR, α) (27)

≥ supα

supΛ

{EP

(e2(C(SNR,α)−R−ζ2(Λ))

)− ζ1(Λ)}

(28)

= supα

EP

(e2(C(SNR,α)−R)

)(29)

= EP

(e2(CAWGN(SNR)−R)

)(30)

where (27) follows from the data-processing argument; (28) follows from (23); (29) is attainedby lattices which are Rogers-good; and (30) is uniquely attained by α = αMMSE.

1While the parametrization of (24) is indeed due to [12], Poltyrev’s parametrization [7] differs from thatof [12] only by some multiplicative factors like 2 and 2π.

7

Recall from [9, pp. 338-343] that the random-coding lower bound on the error exponentEAWGN(R; SNR) of the AWGN channel (3) is given by

EAWGN(R; SNR) ≥ ErcAWGN

(e2(CAWGN(SNR)−R); SNR

)(31)

where

2ErcAWGN(µ; SNR)

=SNR

2(SNR + 1)

{(SNR + 1 + µ)− (SNR + 1− µ) +

√1 +

SNR + 1

SNR

4

SNR + 1− µ

}

+ ln

{SNR + 1

µ− SNR

2µ(SNR + 1− µ)

(√1 +

SNR + 1

SNR

4

SNR + 1− µ− 1

)}(32)

for 1 ≤ µ ≤ 2(SNR+1)SNR

(1 + SNR

2−

√1 + SNR2

4

);

2ErcAWGN(µ; SNR)

=

(1 +

SNR

2−

√1 +

SNR2

4

)+ ln

{µ

2(SNR + 1)

(√1 +

SNR2

4+ 1

)}(33)

for 2(SNR+1)SNR

(1 + SNR

2−

√1 + SNR2

4

)≤ µ ≤ 8(SNR+1)

SNR2

(√1 + SNR2

4− 1

); and

2ErcAWGN(µ; SNR) =

SNR

2

(1−

√1− µ

1 + SNR

)(34)

for µ ≥ 8(SNR+1)

SNR2

(√1 + SNR2

4− 1

).

Figure 2 compares the Poltyrev exponent EP (e2(CAWGN(SNR)−R)) with the random-codinglower bound Erc

AWGN(e2(CAWGN(SNR)−R); SNR) on the error exponents of the AWGN channelat the same SNR. Clearly, the Poltyrev exponent is strictly inferior to the random-codinglower bound on the error exponents of the AWGN channel, and the gap is particularly largeat low rates in the low-SNR regime.

Erez and Zamir [6] proved (23) by directly evaluating an error-exponent formula forgeneric modulo-additive noise channels with transmitter side information [13]. Note that,in their derivation, Erez and Zamir again used the Gaussian-type bounds, e.g., see [6, Ap-pendix A]. However, these bounds might be loose because, while channel capacity is de-termined by the limiting distribution of the channel noise, what also matters for the error

8

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.05

0.1

0.15

0.2

0.25

R/SNR

EP(µ

)/S

NR

, E

AW

GN

rc(µ

;SN

R)/

SN

R

SNR = 10 dB

SNR = 0 dB

SNR = −10 dB

Figure 2: Comparison of the Poltyrev exponent (solid lines) and the random-coding lowerbound on the error exponents of the AWGN channel (dashed lines), both normalized by SNR.Diamonds (“¦”) and circles (“◦”) separate the sphere-packing, straight-line and expurgationregions of the Poltyrev exponent and the random-coding lower bound on the error exponentsof the AWGN channel, respectively.

9

exponents is how the noise distribution approaches that limit. The rationale behind usingthe Gaussian-type bounds offered in [6] seems to be only a computational concern: the error-exponent formula [13] is an adaptation of the Gallager bound [9] (to modulo-additive noisechannels with transmitter side information), and the Gallager bound is known to be hard toevaluate for channels with memory, e.g., the MLAN channel, because it cannot be factoredinto single-letter expressions.

3 A New Lower Bound on the Error Exponents of the

(Λ, α)-MLAN Channel

In this section, we derive a new lower bound on the error exponents of the MLAN channel.Unlike in [6], our derivation does not depend on any previous results on the error exponentsof side-information channels. Instead, we shall start from first principles and proceed withan estimate of the distance distribution of the optimal codes. Such an approach makes itpossible to analyze the error probability of the MLAN channel geometrically rather than bythe Gallager bound. As we shall see, the geometry of the problem plays a central role in theanalysis of the typical error events, therefore allowing for a large-deviation analysis directlyin the high-dimensional space.

3.1 The Encoder and Decoder

A (k, R)-block code for the n-dimensional (Λ, α)-MLAN channel (13) is a set of M = deknRecodewords {(

v(1)w , · · · ,v(k)

w

): v(i)

w ∈ V0(Λ), i = 1, · · · , k; w = 1, · · · ,M}

(35)

where k is the block length and R is the transmission rate measured by nats per dimension(rather than by nats per channel use). When the sub-codeword v

(i)w is input to the (Λ, α)-

MLAN channel (13), the output is

y′(i) =[v(i)

w + N′(i)eff

]mod Λ, i = 1, · · · , k (36)

whereN′(i)

effdef=

[(1− α)U(i) + αN(i)

]mod Λ, (37)

and U(i) and N(i) are independently and identically distributed as Unif(V0(Λ)) andN (0, PNIn),respectively.

10

Define vwdef= (v

(1)w , · · · ,v

(k)w ) and y′ def

= (y′(1), · · · ,y′(k)). The channel from vw to y′ canbe equivalently viewed as a super (Λk, α)-MLAN channel where the shaping lattice Λk is thek-fold direct product of Λ. (Direct product of lattices are defined in (8).) The effective noiseof the (Λk, α)-MLAN channel is

N′eff = [(1− α)U + αN] mod Λk (38)

where U def= (U(1), · · · ,U(k)) ∼ Unif(V0(Λ

k)) and N def= (N(1), · · · ,N(k)) ∼ N (0, PNIkn).

Our encoding and decoding scheme is as follows.

• Encoder: Map each message w ∈ {1, · · · ,M} to a codeword vw.

• Decoder: The estimated message is given by

w = arg minw∈{1,··· ,M}

(minλ∈Λk

‖y′ − (vw + λ)‖)

. (39)

In words, the decoder first decodes the received vector y′ to the nearest (in Euclideanmetric) codeword in an extended codebook

C def=

M⋃w=1

{vw + Λk

}. (40)

It then decides that the transmitted message is the one corresponding to the coset repre-sentative of the estimated codeword in V0(Λ

k). Note that the decision rule (39) might beslightly suboptimal, but this will only strengthen our achievability results. An illustrationof the above decoding procedure when k = 1, n = 2 and M = 9 is shown in Figure 3.

3.2 The Error Probability

The probability of decoding error Pe,w given the transmitted message w is given by

Pe,w = Pr[(vw + N′eff) /∈ VC(vw)] (41)

where VC(vw) is a nearest neighbor decoding region of vw in C. The extended codebook

C has an infinite number of codewords and hence an infinite rate. The right side of (41)thus reminds us of Poltyrev’s notion of coding for the unconstrained AWGN channel [7] forwhich the decoding error probability is measured against the density of any infinite input

11

1

2

38

9

1

1

2

5

7

4

2

7

4

3 8

5

9

eff

y’

N’

6

v1

Figure 3: An illustration of the decoding procedure. Filled circles (“•”) denote the codewordswithin the basic Voronoi region V0(Λ), and circles (“◦”) denote their translations in the

extended codebook C. The number next to a codeword is the index of the associated message.Dotted lines separate the decoding regions of codewords in C. The received vector y′ =[v1 + N′

eff ] mod Λ correctly decodes to message 1.

12

constellation. The main difference is that in a MLAN channel the noise is additive (i.e.,independent of the transmitted codeword) but generally non-Gaussian (due to the uniformcomponent and the aliasing).

In fact, Poltyrev’s technique [7] for estimating the decoding error probability is basedon the distance distribution of the input code and can be used to upper bound the nearestneighbor decoding error probability of any unconstrained additive noise channel in whichthe noise is spherically distributed. To apply Poltyrev’s technique [7] to analyze the decodingerror probability (41), we first need to “sphericalize” the effective noise N′

eff .

Lemma 2 The decoding error probability (41) is bounded from above as

Pe,w ≤ Pr[(vw + Neff) /∈ VC(vw)] (42)

where Neff = (1 − α)U + αN is the effective noise of the (Λk, α)-MLAN channel beforealiasing.

Proof: We first note that

{(vw + N′

eff) ∈ VC(vw)}

=⋃

λ∈Λk

{(vw + Neff) ∈ VC(vw + λ)

}. (43)

Since 0 is always a lattice point, the statement of the lemma follows from the fact that{(vw + N′

eff) /∈ VC(vw)}

is a subset of{(vw + Neff) /∈ VC(vw)

}. ¤

Lemma 3 ([6]) Let m = kn be the dimension of the code and let B ∼ Unif(Bm

(0,

√mP ′

X

))be a random vector uniformly distributed over the m-dimensional ball of center 0 and radius√

mP ′X with P ′

Xdef= r2

cov(Λk)/m. Assume B and N are statistically independent. Then

Pr[(vw + Neff) /∈ VC(vw)] ≤ em ln ρcov(Λk)Pr[(vw + Z) /∈ VC(vw)] (44)

withZ def

= (1− α)B + αN. (45)

Note that the random vector Z defined in (45) is spherically distributed. Furthermore, ifwe suppose the shaping lattice Λ is Rogers-good, by (9) and (10) we have limk→∞ ln ρcov(Λ

k) =ε1(Λ) and P ′

X = (1 + ε2(Λ)) PX where

ε1(Λ) def= ln ρcov(Λ) +

1

2ln (2πeG∗

n) +1

2ln

(1 +

2

n

)(46)

13

and

ε2(Λ) def=

ρ2cov(Λ)G∗

n

G(Λ)

(1 +

2

n

)− 1, (47)

both of which approach zero as the lattice dimension n → ∞. In this case, the “spherical”upper bound on the right side of (44) only incurs an asymptotically small increase in thenoise power and an exponentially small boost in the decoding error probability. A rigorousproof of Lemma 3 can be found in [6, Appendix A].

3.3 Random-Coding Analysis

We now turn to the right side of (44) and derive a random-coding upper bound on

λ(vw) def= Pr[(vw + Z) /∈ VC(vw)] (48)

assuming that the codewords vw, w = 1, · · · ,M , are independently and identically chosenaccording to Unif(Vo(Λ

k)). Following the footsteps of Poltyrev [7], we have

λ(vw) ≤ Pr[(vw + Z) /∈ VC(vw) and ‖Z‖ ≤ d] + Pr[‖Z‖ ≥ d] (49)

≤∑

{v∈C\{vw}: ‖v−vw‖≤2d}Pr[Z ∈ Dm(d, ‖v − vw‖)] + Pr[‖Z‖ ≥ d], ∀ d > 0 (50)

where Dm(d, ρ) is the section of the m-dimensional ball Bm(0, d) cut off by the hyperplanethat slices Bm(0, d) at distance ρ/2 from the center, and (50) follows from the union bound.Note that, in the right side of (50), we used the fact that Z is spherically distributed sothe pairwise error probability Pr[Z ∈ Dm(d, ‖v − vw‖)] is only a function of the Euclidean

distance between v and vw. We may then rewrite (50) using the distance distribution of Cwith respect to vw. For ∆ > 0, let Mi(vw) be the number of codewords v ∈ C such that(i− 1)∆ < ‖v − vw‖ ≤ i∆. We have

λ(vw) ≤d2d/∆e∑

i=1

Mi(vw)Pr[Z ∈ Dm(d, (i− 1)∆)] + Pr[‖Z‖ ≥ d]. (51)

Since vw, w = 1, · · · ,M , are independently and identically chosen according to Unif(V0(Λk))

and C is generated by tiling Rn with translations of V0(Λk) relative to Λk, the ensemble

average of Mi(vw) is proportional to the volume of the spherical shell Bm(vw, i∆)\Bm(vw, (i−

14

1)∆). Thus, we have

E[Mi(vw)] =M

V (Λk)· Vol(Bm(vw, i∆) \ Bm(vw, (i− 1)∆)) (52)

≤ M

V (Λk)· mπm/2

Γ(m/2 + 1)(i∆)m−1∆ (53)

=emδmπm/2

Γ(m/2 + 1)(i∆)m−1∆ (54)

with

δ def=

1

mln

M

V (Λk)(55)

and the geometric expression mπm/2

Γ(m/2+1)being the surface area of a unit m-dimensional ball.

Averaging (51) over the ensemble and letting ∆ → 0, we obtain

E[λ(vw)] ≤ emδmπm/2

Γ(m/2 + 1)

∫ 2d

0

ρm−1Pr[Z ∈ Dm(d, ρ)]dρ + Pr[‖Z‖ ≥ d]. (56)

We note that the right side of (56) is independent of the choice of message w, so it may alsoserve as an upper bound on the ensemble average of

λ def=

1

M

M∑w=1

λ(vw). (57)

We thus have proved the following result.

Lemma 4 There exists a (k, R)-block code for the (Λ, α)-MLAN channel (13) such that

λ ≤ emδmπm/2

Γ(m/2 + 1)

∫ 2d

0

ρm−1Pr[Z ∈ Dm(d, ρ)]dρ + Pr[‖Z‖ ≥ d], ∀ d > 0. (58)

The upper bound on the right side of (58) can be improved for small values of R bymeans of an expurgation procedure. The result is summarized in the following lemma.

Lemma 5 There exists a (k, R)-block code for the (Λ, α)-MLAN channel (13) such that

λ ≤ 16

(emδ′mπm/2

Γ(m/2 + 1)

∫ 2d

√mρ0

ρm−1Pr[Z ∈ Dm(d, ρ)]dρ + Pr[‖Z‖ ≥ d]

), ∀ d > 0. (59)

15

Proof: See Appendix A. ¤

Next, we provide some results regarding the tail of the spherical noise vector Z definedin (45). As we shall see, the distribution of Z possesses rather different decay behavior thanthe exponentially-quadratic decay of Gaussian tails.

Lemma 6 Let fZ(z) be the probability density function of Z defined in (45). We have

fZ

(√mz

) ≤ (m− 1)Γ(m/2 + 1)√4πα2PNΓ((m + 1)/2)

exp

[−(m− 2)EZ

(‖z‖2

σ′2; SNR′, α

)], ∀ z ∈ Rm

(60)with σ′2 def

= (1− α)2P ′X + α2PN and SNR′ def

= P ′X/PN . The exponent EZ(·) satisfies

2EZ(µ; SNR, α) =

{E0(µ; SNR, α) + ln

(2πeσ′2µ

), 0 ≤ µ ≤ µ0(SNR, α)

Esp(µ; SNR, α) + ln(2πeσ′2µ

), µ ≥ µ0(SNR′, α)

(61)

where

E0(µ; SNR, α) def= ln

(1− α)2SNR

((1− α)2SNR + α2) µ, (62)

µ0(SNR, α) def= max

{(1− α)2SNR− α2

(1− α)2SNR + α2, 0

}, (63)

Esp(µ; SNR, α) def=

(1− α)2SNR + α2

α2

(1 + µ− 2µ

g1(µ, SNR, α)

)− ln g1(µ, SNR, α), (64)

and

g1(µ; SNR, α) def=

√(4(1− α)4SNR2 + 4α2(1− α)2SNR

)µ + α4 − α2

2(1− α)2SNR. (65)

Proof: See Appendix B. ¤

Figure 4 displays EZ(µ; SNR, α), normalized by SNR, as a function of µ for differentvalues of α. Note that, whereas a Gaussian vector (α = 1) has a normal tail, a uniformvector over a ball (α = 0) has no tail. For α between 0 and 1, the distribution of thespherical noise vector Z “improves” from a normal tail to “no-tail” as α decreases.

Lemma 7 Let λZ(ξ) def= E

[eξ‖Z‖2

]be the moment-generating function of ‖Z‖2. Then

λZ(ξ) ≤ exp

[−m

(1

2ln

(1− 2α2PNξ

)− (1− α)2P ′Xξ

1− 2α2PNξ

)](66)

for any 0 ≤ ξ < (2α2PN)−1

.

16

0 0.5 1 1.5 2 2.5 310

−1

100

101

102

103

104

µ

EZ(µ

;SN

R,α

)/S

NR

α=0.99

α=0.01

Figure 4: Exponent EZ(µ; SNR, α) of (61), normalized by SNR, as a function of µ for differentvalues of α.

17

Proof: See Appendix C. ¤

Combine Lemmas 2, 3, 4 and 5. By investigating the asymptotic behavior of the expo-nents of the right side of (56) and (59) (with the help of Lemmas 6 and 7), we obtain a newlower bound on the error exponents of the (Λ, α)-MLAN channel (13) stated in the followingtheorem.

Theorem 8 The error exponent EΛ(R; α) of the (Λ, α)-MLAN channel (13) satisfies

EΛ(R; α) ≥ E(e2(C(SNR′,α)−R−ε1(Λ)); SNR′, α

)− ε1(Λ) (67)

where C(·) is defined in (15), SNR′ def= (1 + ε2(Λ))SNR, and ε1(·) and ε2(·) are defined in

(46) and (47), respectively. The exponent E(·) satisfies

2E(µ; SNR, α) =

Esp(µ; SNR, α), 1 ≤ µ ≤ µcr(SNR, α)Esl(µ; SNR, α), µcr(SNR, α) ≤ µ ≤ 2µcr(SNR, α)Eex(µ; SNR, α), µ ≥ 2µcr(SNR, α)

(68)

where Esp(·) is defined in (64),

Esl(µ; SNR, α) def= Esp (µcr(SNR, α); SNR, α)− ln

µcr(SNR, α)

µ, (69)

Eex(µ; SNR, α) def= Esp (g2(µ; SNR, α); SNR, α)− ln

(1− µ

4g2(µ; SNR, α)

), (70)

µcr(SNR, α) def=

(1− α)2SNR + 3α2 +√

((1− α)2SNR + 3α2)2 − 8α4

2 ((1− α)2SNR + α2), (71)

and g2(µ; SNR, α) is the unique zero of the function

f(x) =√(

4(1− α)4SNR2 + 4α2(1− α)2SNR)x + α4 + α2

− 2((1− α)2SNR + α2

)x +

2α2µ

4x− µ,

µ

4≤ x ≤ µ

2, µ ≥ 2µcr. (72)

Proof: See Appendix D. ¤

Proposition 9 The error exponent ECosta(R; SNR) of the Costa’s dirty-paper channel islower-bounded by the random-coding lower bound Erc

AWGN(R; SNR) on the error exponents ofthe AWGN channel at the same SNR in the sphere-packing and straight-line regions, i.e,

ECosta(R; SNR) ≥ ErcAWGN(R; SNR), Rex

AWGN(SNR) ≤ R ≤ CAWGN(SNR) (73)

18

where

RexAWGN(SNR) def

=1

2ln

(1

2+

1

2

√1 +

SNR2

4

). (74)

Proof: We have

ECosta(R; SNR) ≥ supα

supΛ

EΛ(R; SNR, α) (75)

≥ supα

supΛ

{E

(e2(C(SNR′,α)−R−ε1(Λ)); SNR′, α

)− ε1(Λ)

}(76)

= supα

E(e2(C(SNR,α)−R); SNR, α

)(77)

where (76) follows from Theorem 8, and (77) follows from the facts that ε1(Λ) → 0 andSNR′ → SNR with Λ chosen to be Rogers-good. The desired result (73) thus follows fromthe explicit solution to the optimization problem on the right side of (77). The optimallattice-scaling parameter αLD (the subscript “LD” stands for “Large Deviation”) is given by

αLD =

1 + SNR2−

√1 + SNR2

4, Rex

AWGN(SNR) ≤ R ≤ RcrAWGN(SNR)√

β2

4+ β − β

2, Rcr

AWGN(SNR) ≤ R ≤ CAWGN(SNR)(78)

where

RcrAWGN(SNR) def

=1

2ln

(1

2+

SNR

4+

1

2

√1 +

SNR2

4

)(79)

and β def= SNR(1− e−2R). ¤

3.4 On the Achievability of the Poltyrev Exponents

Here, we comment on the achievability of the Poltyrev exponents for the MLAN channel. Inour derivation of the new lower bound (67), Lemmas 2 and 3 are used to connect the errorprobability of the (Λk, α)-MLAN channel to that of an unconstrained additive noise channelin which the noise is a weighted sum of a white Gaussian and a spherically-uniform vector.It is very tempting to go further down that road and connect the error probability of the(Λk, α)-MLAN channel to that of an unconstrained AWGN channel. The following lemmaestablishes the connection.

19

Lemma 10 Let G ∼ N (0, PXIm) be statistically independent of N ∼ N (0, PNIm). For Zdefined in (45), we have

Pr[(vw + Z) /∈ VC(vw)] ≤ emε3(Λk)Pr[(vw + N1) /∈ VC(vw)] (80)

whereN1

def= (1− α)G + αN (81)

and

limk→∞

ε3(Λk) =

1

2

(P ′

X

PX

− lnP ′

X

PX

− 1

). (82)

Now (80) is a Gaussian-type bound in that, in contrast to Z, N1 is strictly Gaussian. Theachievability of the Poltyrev exponents for the MLAN channel thus follows from Lemmas2, 3, 10 and Poltyrev’s results on the error exponents of the unconstrained AWGN channel[7]. This gives an alternative proof of Erez and Zamir’s result on the achievability of thePoltyrev exponents for the MLAN channel. However, the use of Gaussian-type bounds is inthe same spirit.

4 Numerical Examples and Discussion

In this section, we provide some numerical examples to illustrate the results of Section 3. InFigures 5-7, we plot the exponents EP (e2(C(SNR,αMMSE)−R)), Erc

AWGN(e2(CAWGN(SNR)−R); SNR),E(e2(C(SNR,αMMSE)−R); SNR, αMMSE) and E(e2(C(SNR,αLD)−R); SNR, αLD), all normalized by SNR,as a function of R for SNR = −10, 0, 10 dB, respectively. We have also plotted αLD, nor-malized by αMMSE, as a function of R for the same SNRs. A few observations and remarksare now in order.

1. Fix α = αMMSE. We observe from these examples that EP (e2(C(SNR,αMMSE)−R)) is strictlysmaller than E(e2(C(SNR,αMMSE)−R); SNR, αMMSE) for all rates below channel capacity.Therefore, Erez and Zamir’s conjecture (in a preliminary version of [6]) on the asymp-totic optimality of the Poltyrev exponents for the MLAN channel is not true.

2. In the high-SNR regime, EΛ(e2(C(SNR,αLD)−R); SNR, αLD) ≈ EP (e2(C(SNR,αMMSE)−R)) forall rates below channel capacity (e.g., see Figure 7). This suggests that the MLANchannel (with a proper choice of the shaping lattice and the scaling parameter) isasymptotically equivalent to Poltyrev’s unconstrained AWGN channel at the samevolume-to-noise ratio in the limit as SNR tends to infinity. The reason is that, inthe high-SNR regime, the optimal scaling parameter αLD ≈ 1 and the effective noisebecomes Gaussian before aliasing.

20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

R/CAWGN

(SNR)

Err

or E

xpon

ents

Nor

mal

ized

by

SN

R

(a)

E(µ;SNR,αMMSE

) / SNRE(µ;SNR,α

LD) / SNR

EP(µ) / SNR

EAWGNrc (µ;SNR) / SNR

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.4

0.5

0.6

0.7

0.8

0.9

1

R/CAWGN

(SNR)

α LD/α

MM

SE

(b)

Figure 5: (a) Error exponents, normalized by SNR, as a function of R for SNR = −10 dB;(b) The optimal lattice-scaling parameter αLD, normalized by αMMSE, as a function of R forSNR = −10 dB.

21

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

R/CAWGN

(SNR)

Err

or E

xpon

ents

Nor

mal

ized

by

SN

R

(a)

E(µ;SNR,αMMSE

) / SNRE(µ;SNR,α

LD) / SNR

EP(µ) / SNR


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.75

0.8

0.85

0.9

0.95

1

R/CAWGN

(SNR)

α LD/α

MM

SE

(b)

Figure 6: (a) Error exponents, normalized by SNR, as a function of R for SNR = 0 dB;(b) The optimal lattice-scaling parameter αLD, normalized by αMMSE, as a function of R forSNR = 0 dB. 22

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

R/CAWGN

(SNR)

Err

or E

xpon

ents

Nor

mal

ized

by

SN

R

(a)

E(µ;SNR,αMMSE

) / SNRE(µ;SNR,α

LD) / SNR

EP(µ) / SNR


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.97

0.975

0.98

0.985

0.99

0.995

1

R/CAWGN

(SNR)

α LD/α

MM

SE

(b)

Figure 7: (a) Error exponents, normalized by SNR, as a function of R for SNR = 10 dB;(b) The optimal lattice-scaling parameter αLD, normalized by αMMSE, as a function of R forSNR = 10 dB. 23

3. αMMSE is strictly suboptimal in maximizing E(e2(C(SNR,α)−R); SNR, α) except for therate equal to channel capacity. This is because, when the Gaussian-type bounds areused, α affects the Poltyrev exponent EP (e2(C(SNR,α)−R)) only through the variance ofthe Gaussian noise (or, equivalently, the volume-to-noise ratio e2(C(SNR,α)−R)) for whichαMMSE is the unique minimizer. The new lower bound, on the other hand, takes intoaccount the tail heaviness of the effective noise which is also controlled by the scalingparameter α (e.g, see Figure 4). The deviation of αLD from αMMSE indicates that thereis a tradeoff between tail heaviness and variance of the noise in optimizing the errorexponent of the MLAN channel. Whereas αMMSE is second-moment optimal, αLD islarge-deviation optimal. Note that, while a Gaussian distribution has a normal tail, auniform distribution over a ball has no tail. One would thus expect α to be smaller tofavor the latter as we balance the large-deviation exponents. When the transmissionrate is approaching the channel capacity, we begin to exit the large-deviation regimeand enter the central-limit-theorem regime. “Large-deviation optimal” is replaced by“second-moment optimal” (which implies “mutual-information optimal” in this case),and αLD approaches αMMSE.

4. This tail-heaviness/variance tradeoff reminds us of the Gaussian arbitrarily-varyingchannel in which the worst-case noise is equivalent in distribution (induced by thestochastic encoder/decoder) to the sum of a white Gaussian and a uniform (over thesurface of a ball) vector. A surprising result of [14] is that the error exponents ofa Gaussian arbitrarily-varying channel are actually better than those of the AWGNchannel at the same SNR.

5. Even though E(e2(C(SNR,αLD)−R); SNR, αLD) = ErcAWGN(e2(CAWGN(SNR)−R); SNR) in the

sphere-packing and straight-line regions, a gap remains in the expurgation region be-tween the new lower bound and the random-coding lower bound on the error exponentsof the AWGN channel at the same SNR. We suspect that this gap is inherent to the“inflation-receiving” scheme and can only be bridged by exploring more complicatedreceiving schemes, possibly involving the local geometry of the input code.

5 Lattice Encoding and Decoding for AWGN Channels

Motivated by their structured binning scheme for Costa’s dirty-paper channel, Erez andZamir [4, 5, 6] showed that nested lattice codes in conjunction with lattice decoding canachieve capacity of the AWGN channel, thus cracking the long-standing open problem ofachieving capacity of AWGN channels with lattice encoding and decoding. In this section, weextend our results to AWGN channels and show that Erez and Zamir’s lattice encoding anddecoding scheme not only is capacity-achieving, but also is error-exponent-lossless relative to

24

-

U

mod Λ mod Λx y

αN

y′v

Figure 8: MLAN channel transformation of the AWGN channel Y = X + N.

the optimal codes (with maximum-likelihood decoding) for rates sufficiently close to channelcapacity.

The key idea of [4, 5, 6] is to use an inflated-lattice strategy to transform the AWGN chan-nel (3) into a MLAN channel. A diagram of the transformation scheme is shown in Figure 8.The similarity between Figures 8 and 1 is obvious. The resulting (Λ, α)-MLAN channel isagain given by (13). Recall that our analysis of the error exponents of the (Λ, α)-MLANchannel (13) takes on a random-code ensemble in which the codewords are independentlyidentically chosen according to a uniform distribution over the basic Voronoi region V0(Λ) ofthe shaping lattice Λ. What makes the MLAN transformation interesting is that the samerandom-coding performance can be attained by the more structured nested lattice codes.

A lattice Λ (the coarse lattice) is nested in Λ1 (the fine lattice) if Λ ⊆ Λ1, i.e., if Λ is asublattice of Λ1. The set of coset leaders of Λ relative to Λ1,

C def= {Λ1 ∩ V0(Λ)} , (83)

is called a nested lattice code. The rate of the nested lattice code is

R =1

nln |C| = 1

nln

V (Λ)

V (Λ1). (84)

When the nested lattice code C is used, the extended codebook C becomes the fine latticeΛ1. The decoding rule (39) is equivalent to producing an estimate

c = QΛ1(y′) mod Λ (85)

for the transmitted codeword c. Note that (85) describes a (minimum-Euclidean-distance)lattice decoder which finds the closest lattice point, ignoring the boundary of the code.Such an unconstrained search preserves the lattice symmetry in the decoding process andreduces complexity. An ensemble of “good” nested lattice codes can be constructed usingthe following steps [15, 16]:

25

1. Let p be a prime number. Draw a generating vector g = (g1, · · · , gn) in which gi,i = 1, · · · , n, are independently identically chosen according to Unif({0, · · · , p− 1}).

2. Define the codebook

C def= {x ∈ Zn

p : x = qg mod p, q = 0, · · · , p− 1}. (86)

3. Apply Construction A [16] to lift C to Rn and form the lattice

Λ′1 = p−1C + Zn. (87)

4. Note that the cubic lattice Zn may be viewed as nested in Λ′1. Let G be the generatormatrix of a Rogers-good lattice. Apply the linear transformation G to Λ′1. It followsthat Λ = ZnG is a sublattice of Λ1 = Λ′1G.

Since linear transformations preserve the nesting ratio, the coding rate (84) is

R =1

nln

V (Zn)

V (Λ′1)=

1

nln p. (88)

For a given rate R, we therefore must choose p = denRep where d·ep denotes the operationof ceiling to the smallest prime number.

Note that p is exponentially increasing with the dimension n. For large p, the resultingensemble is “matched” to the (Λ, α)-MLAN channel in that the codewords of the nested codebecome uniform over the basic Voronoi region V0(Λ) [15]. Hence, a typical member of theensemble approaches the optimal random-coding exponents of this channel [6, Appendix C].In light of Theorem 8 and Proposition 9, we conclude that lattice encoding and decodingsuffers no loss of error exponents relative to the optimal codes (with maximum-likelihooddecoding) for rates sufficiently close to channel capacity.

6 Concluding Remarks

We derived a new lower bound on the error exponents of the MLAN channel. Whereas Erezand Zamir derived the Poltyrev exponents as a lower bound on the error exponents of theirscheme, we established the new lower bound using Poltyrev’s random-coding bounds as astarting point. (Our development thus gives a concise rationale for why αMMSE is the uniquevalue of α that maximizes the Erez-Zamir-Poltyrev exponents.) The new lower bound isobtained by seeking the tradeoff between tail heaviness and variance of the effective noise that

26

maximizes the error exponents of the MLAN channel. As a consequence, the optimal lattice-scaling parameter αLD becomes rate-adaptive and is chosen according to the large-deviationprinciple. The fact that αLD differs from αMMSE is barely surprising considering that theMMSE estimator is optimum when the quantity to be optimized is mutual information (ina linear Gaussian channel), but not necessarily for other optimization problems.

With a proper choice of the shaping lattice and the scaling parameter, the new lowerbound coincides with the random-coding lower bound on the error exponents of the AWGNchannel at the same SNR in the sphere-packing and straight-line regions. Therefore, at leastfor rates close to channel capacity, (1) writing on dirty paper is as reliable as writing onclean paper; and (2) lattice encoding and decoding suffer no loss of exponents relative to theoptimal codes (with maximum-likelihood decoding) for the AWGN channel.

Finally, we would like to point out that the main thing that is currently missing in thispaper is an explanation for the surprising zero loss of error exponents (at least for rates closeto channel capacity) of the MLAN channel transformation (with the optimal choice of theshaping lattice and the scaling parameter α). Our large-deviation analysis discovers this fact“by surprise”, i.e., by comparing the obtained expression with the optimal one. We believethat such a coincidence should have a more fundamental explanation! The pursuit of suchan explanation is worthy of future research.

A Proof of Lemma 5

We start from a code ensemble of M ′ = 2M codewords vw, w = 1, · · · ,M ′, independentlyand identically drawn according to Unif(Vo(Λ

k)). Define

C ′ def=

M ′⋃w=1

{vw + Λk

}(89)

and denote by Mρ(vw) the number of codewords v ∈ C ′\{vw} such that ‖v − vw‖ <√

mρ.Let ρ0 satisfy the equation

E

[1

M ′

M ′∑w=1

Mρ0(vw)

]= 0.05. (90)

Thus,

ρ0 =1

m

(0.05 · V (Λk)

M ′ · Vol(Bm(0, 1))

)2/m

. (91)

Given ρ0, define

λ1(vw) def= Pr[(vw + Z) /∈ VC′(vw) and ‖Z‖ <

√mρ0] (92)

27

andλ2(vw) def

= Pr[(vw + Z) /∈ VC′(vw) and ‖Z‖ ≥ √mρ0]. (93)

Following the derivation of Lemma 4, we obtain

E

[1

M ′

M ′∑w=1

λ2(vw)

]≤ emδ′mπm/2

Γ(m/2 + 1)

∫ 2d

√mρ0

ρm−1Pr[Z ∈ Dm(d, ρ)]dρ + Pr[‖Z‖ ≥ d], ∀ d > 0

(94)where

δ′ = δ + (1/m) ln 2. (95)

Define the events

A def=

{1

M ′

M ′∑w=1

λ2(vw) < 4

(emδ′mπm/2

Γ(m/2 + 1)

∫ 2d

√mρ0


)}

(96)and

B def=

{1

M ′

M ′∑w=1

Mρ0(vw) < 0.2

}. (97)

By (90), (94) and the Chebyshev inequality, we have Pr[Ac] ≤ 0.25 and Pr[Bc] ≤ 0.25. Itfollows that

Pr[A ∩B] = 1− Pr[Ac ∪Bc] (98)

≥ 1− Pr[Ac]− Pr[Bc] (99)

≥ 0.5, (100)

i.e., there exists a set of M ′ codewords vw, w = 1, · · · , M ′, such that

1

M ′

M ′∑w=1

λ2(vw) < 4

(emδ′mπm/2

Γ(m/2 + 1)

∫ 2d

√mρ0


)(101)

and

1

M ′

M ′∑w=1

Mρ0(vw) < 0.2. (102)

Similarly, applying the Chebyshev inequality to (101) and (102), we conclude that thereexists at least a subset of 0.5M ′ = M codewords from vw, w = 1, · · · ,M ′, such that

λ2(vw) < 16

(emδ′mπm/2

Γ(m/2 + 1)

∫ 2d

√mρ0


)(103)

andMρ0(vw) < 0.8 (104)

for every vw in the chosen subset. Note that (104) implies that λ1(vw) = 0 and henceλ(vw) = λ1(vw) + λ2(vw) = λ2(vw). This completes the proof of Lemma 5.

28

B Proof of Lemma 6

Denote by fN and fB the probability density function of random vectors N ∼ N (0, PNI)and B ∼ Unif(Bm(0,

√mP ′

X), respectively. Since Z = (1 − α)B + αN in which B and Nare statistically independent, we have

fZ(√

mz) =1

αm(1− α)m

∫

Rm

fN

(√mz − x

α

)fB

(x

1− α

)dx (105)

=mm/2

αm(1− α)m

∫

Rm

fN

(√mz −√mx

α

)fB

(√mx

1− α

)dx (106)

=Γ(m/2 + 1)

(2π2α2PN(1− α)2P ′X)m/2

∫

Bm(0,√

(1−α)2P ′X)exp

[−m‖z − x‖2

2α2PN

]dx (107)

where (106) follows from a linear change of variable from x to√

mx. Expanding the integralon the right side of (107) in spherical coordinates (refer to Figure 9), we obtain∫

Bm(0,√

(1−α)2P ′X)exp

[−m‖z − x‖2

2α2PN

]dx (108)

=(m− 1)π(m−1)/2

Γ((m + 1)/2)

∫ √(1−α)2P ′X

0

∫ π

0

exp

[−m

r2 + ‖z‖2 − 2r‖z‖ cos θ

2α2PN

]rm−1(sin θ)m−2drdθ.

Now, define

A(r, θ) def=

r2 + ‖z‖2 − 2r‖z‖ cos θ

2α2PN

− ln(r sin θ). (109)

The double integral on the right side of (108) can be bounded from above as

∫ √(1−α)2P ′X

0

∫ π

0

exp

[−m

r2 + ‖z‖2 − 2r‖z‖ cos θ

2α2PN

]rm−1(sin θ)m−2drdθ

≤√

(1− α)2P ′X

∫ √(1−α)2P ′X

0

∫ π

0

exp [−(m− 2)A(r, θ)] drdθ (110)

≤ π(1− α)2P ′X exp

[−(m− 2) min

(r,θ)A(r, θ)

](111)

where the minimization is over the set{

(r, θ) : 0 ≤ r ≤ √(1− α)2P ′

X , 0 ≤ θ ≤ π}

. Simple

calculations yield

min(r,θ)

A(r, θ) =

{12ln e

α2PN, 0 ≤ ‖z‖2

σ′2 ≤ µ0(SNR′, α)

Esp

(‖z‖2σ′2 ; SNR′, α

)+ 1

2ln e‖z‖2

α2PN (1−α)2P ′X, ‖z‖2

σ′2 ≥ µ0(SNR′, α)(112)

where µ0(·) is defined in (63). Substituting (108), (111) and (112) into (107), we obtain thedesired result (60). This completes the proof of Lemma 6.

29

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

√r2 + ‖z‖2 − 2r‖z‖ cos θ

zo

dr

rdθ

dθ

θ

Figure 9: Plane of cone of half-angle θ.

C Proof of Lemma 7

The moment-generating function λZ(·) for ‖Z‖2 is given by

λZ(ξ) = E[eξ‖Z‖2

](113)

= E[E

[eξ‖Z‖2

∣∣∣B]]

(114)

= E[exp

[−m

2loge

(1− 2α2PNξ

)+

(1− α)2ξ

1− 2α2PNξ‖B‖2

]](115)

for any ξ < (2α2PN)−1. Here, (115) follows from the fact that, conditioned on B, (α2PN)−1‖Z‖2

follows the noncentral Chi-square distribution

χ2m

((1− α)2‖B‖2

α2PN

). (116)

The desired result (66) thus follows from (115) and the fact that ‖B‖2 ≤ mP ′X with proba-

bility one. This completes the proof of Lemma 7.

30

D Proof of Theorem 8

We now use Lemmas 6 and 7 (combined with some geometric analysis) to investigate theexponent of the right side of (56). Let d =

√mρd for some ρd > 0 and define

I1(d) def= Pr[‖Z‖ ≥ d] and I2(d) def

=emδmπm/2

Γ(m/2 + 1)

∫ 2d

0

ρm−1Pr[Z ∈ Dm(d, ρ)]dρ. (117)

First, I1(d) can be bounded from above as

I1(d) = Pr[‖Z‖2 ≥ mρd] (118)

≤ exp [−ξmρd + ln λZ(ξ)] (119)

≤ exp

[−m

(ξρd +

1

2ln

(1− 2α2PNξ

)− (1− α)2P ′Xξ

1− 2α2PNξ

)](120)

for any 0 ≤ ξ < (2α2PN)−1

, Here, (119) follows from the Chernoff bound on ‖Z‖2, and (120)follows from Lemma 7. Choosing ξ to minimize the right side of (120), we obtain

− 2

mln I1(d) ≥

{Esp

(ρd

σ′2 ; SNR′, α), ρd

σ′2 ≥ 10, 0 < ρd

σ′2 ≤ 1.(121)

Second, following a change of variable from ρ to√

mρ, I2(d) can be written as

I2(d) =emδmm/2πm/2

Γ (m/2)

∫ 4ρd

0

ρm/2−1Pr[Z ∈ Dm(√

mρd,√

mρ)]dρ. (122)

Expanding Pr[Z ∈ Dm(√

mρd,√

mρ)] in spherical coordinates yields

Pr[Z ∈ Dm(√

mρd,√

mρ)]

=(m− 1)π(m−1)/2

Γ((m + 1)/2)

∫ √mρd

√mρ/4

∫ arccos√

mρ/(4r2)

0

ϕZ(r) rm−1(sin θ)m−2drdθ (123)

=(m− 1)mm/2π(m−1)/2

Γ((m + 1)/2)

∫ ρd

ρ/4

∫ arccos√

ρ/(4r)

0

ϕZ(√

mr) rm/2−1(sin θ)m−2drdθ (124)

≤ (m− 1)mm/2π(m−1)/2

Γ((m + 1)/2)

∫ ρd

ρ/4

ϕZ(√

mr) (r − ρ/4)m/2−1dr (125)

where ϕZ(r) def= fZ(x) for r = ‖x‖. Here, (124) follows from the change of variable from r

to√

mr, and (125) follows from the monotonicity of the sin θ for 0 ≤ θ ≤ arccos√

ρ/(4r).

31

Substituting (60) of Lemma 6 into (125), we obtain

Pr[Z ∈ Dm(√

mρd,√

mρ)]

≤ Am

∫ ρd

ρ/4

exp

[−(m− 2)

(EZ

( r

σ′2

)− 1

2ln

(r − ρ

4

))]dr (126)

≤ ρdAm exp

[−(m− 2) min

ρ/4≤r≤ρd

{EZ

( r

σ′2

)− 1

2ln

(r − ρ

4

)}](127)

where

Amdef=

(m− 1)2mm/2πm/2Γ(m/2 + 1)

2α2PNΓ2((m + 1)/2). (128)

Substituting (127) into (122) gives

I2(d) ≤ Bm

∫ 4ρd

0

exp

[−(m− 2) min

0≤r≤ρd

{EZ

( r

σ′2

)− 1

2ln

(rρ− ρ2

4

)}]dρ (129)

≤ 4ρdBm exp

[−(m− 2) min

0≤ρ≤4ρd

minρ/4≤r≤ρd

{EZ

( r

σ′2

)− 1

2ln

(rρ− ρ2

4

)}](130)

= 4ρdBm exp

[−(m− 2) min

0≤r≤ρd

min0≤ρ≤4r

{EZ

( r

σ′2

)− 1

2ln

(rρ− ρ2

4

)}](131)

= 4ρdBm exp

[−(m− 2) min

0≤r≤ρd

{EZ

( r

σ′2

)− 1

2ln max

0≤ρ≤4r{rρ− ρ2

4}}]

(132)

= 4ρdBm exp

[−(m− 2) min

0≤r≤ρd

{EZ

( r

σ′2

)− ln r

}](133)

where

Bmdef=

ρd(m− 1)2emδmm+1πm

4α2PNΓ2((m + 1)/2). (134)

Simple calculations yield

2 min0≤r≤ρd

{EZ

( r

σ′2

)− ln r

}

=

Esl

(ρd

σ′2 ; SNR′, α)− ln ρd

2πe, ρd

σ′2 ≥ µcr(SNR′, α)Esp

(ρd


2πe, µ0(SNR′, α) ≤ ρd

σ′2 ≤ µcr(SNR′, α)E0

(ρd


2πe, 0 < ρd

σ′2 ≤ µ0(SNR′, α).(135)

Note from (55) that δ → 12ln

(e2RG(Λ)/PX

)as the block length k → ∞. Applying the

Stirling approximation

Γ(z) =√

2πe−zzz−1/2

(1 +

1

12z+

1

288z2+ · · ·

), (136)

32

we obtain from (134) that

limk→∞

{− 1

mln (4ρdBm)

}=

1

2ln

PX

4π2e2(R+1)G(Λ). (137)

Substituting (135) and (137) into (133), we conclude that

lim supk→∞

{− 2

mln I2(d)

}

≥

Esl

(ρd

σ′2 ; SNR′, α)− ln 2πe2R+1G(Λ)ρd

PX, ρd

σ′2 ≥ µcr(SNR′, α)

Esp

(ρd


PX, µ0(SNR′, α) ≤ ρd

σ′2 ≤ µcr(SNR′, α)

E0

(ρd


PX, 0 < ρd

σ′2 ≤ µ0(SNR′, α).

(138)

Finally, choosing

ρd =PX

2πe2R+1G(Λ), (139)

we obtain from (58), (121) and (138) that

lim supk→∞

{− 2

mln λ

}≥

Esl(µ; SNR′, α), µ ≥ µcr(SNR′, α)Esp(µ; SNR′, α), 1 ≤ µ ≤ µcr(SNR′, α)0, 0 < µ ≤ 1

(140)

where µ = ρd

σ′2 = e2(C(SNR′,α)−R−ε1(Λ)).

To investigate the exponent of the right side of (59), let us define

I2(d) def=

emδmπm/2

Γ(m/2 + 1)

∫ 2d

√mρ0

ρm−1Pr[Z ∈ Dm(d, ρ)]dρ. (141)

Following similar steps as in the derivation of the upper bound for I2(d), we have

I2(d) ≤ 4ρdBm exp

[−(m− 2) min

ρ0≤ρ≤4ρd

minρ/4≤r≤ρd

{EZ

( r

σ′2

)− 1

2ln

(rρ− ρ2

4

)}](142)

= 4ρdBm exp

[−(m− 2) min

ρ0/4≤r≤ρd

minρ0≤ρ≤4r

{EZ

( r

σ′2

)− 1

2ln

(rρ− ρ2

4

)}](143)

= 4ρdBm exp

[−(m− 2) min

ρ0/4≤r≤ρd

{EZ

( r

σ′2

)− 1

2ln max

ρ0≤ρ≤4r

{rρ− ρ2

4

}}](144)

where Bm is defined in (134). Recall from (91) and (139) that ρ0 → ρd as the block lengthk →∞ and note that

maxρ0≤ρ≤4r

{rρ− ρ2

4

}=

{ρ0r − ρ2

0

4, ρ0

4≤ r ≤ ρ0

2

r2, r ≥ ρ0

2.

(145)

33

After some (rather complicated) calculations, we arrive at

lim supk→∞

{− 2

mln I2(d)

}≥

Eex

(ρd

σ′2 ; SNR′, α), ρd

σ′2 ≥ 2µcr(SNR′, α)Esl

(ρd

σ′2 ; SNR′, α), µcr(SNR′, α) ≤ ρd

σ′2 ≤ 2µcr(SNR′, α)Esp

(ρd

σ′2 ; SNR′, α), µ0(SNR′, α) ≤ ρd

σ′2 ≤ µcr(SNR′, α)E0

(ρd

σ′2 ; SNR′, α), 0 < ρd

σ′2 ≤ µ0(SNR′, α).(146)

Putting together (59), (121) and (146), we have

lim supk→∞

{− 2

mln λ

}≥

Eex(µ; SNR′, α), µ ≥ 2µcr(SNR′, α)Esl(µ; SNR′, α), µcr(SNR′, α) ≤ µ ≤ 2µcr(SNR′, α)Esp(µ; SNR′, α), 1 ≤ µ ≤ µcr(SNR′, α)0, 0 < µ ≤ 1

(147)

where µ = ρd

σ′2 = e2(C(SNR′,α)−R−ε1(Λ)).

Apply Lemmas 2, 3, 4 and 5 successively. The desired result (67) then follows from acomparison of (140) and (147). This completes the proof of Theorem 8.

Acknowledgment

The authors wish to thank Prof. G. David Forney, Jr. and two anonymous referees for theirvery helpful comments and suggestions.

References

[1] M. H. M. Costa, “Writing on dirty paper,” IEEE Trans. Inform. Theory, vol. 29, pp.439-441, May 1983.

[2] S. I. Gel’fand and M. S. Pinsker, “Coding for channel with random parameters,” Prob.Contr. Inform. Theory, vol. 9(1), pp. 19-31, 1980.

[3] R. Zamir, S. Shamai (Shitz), and U. Erez, “Nested linear/lattice codes for structuredmultiterminal binning,” IEEE Trans. Inform. Theory, vol. 48, pp. 1250-1276, June 2002.

[4] U. Erez and R. Zamir, “Lattice decoding can achieve 12log(1 + SNR) on the AWGN

channel using nested codes,” in Proc. IEEE Int. Symp. Inform. Theory, Washington,DC, June 2001, p. 125.

34

[5] U. Erez and R. Zamir, “Lattice decoded nested codes achieve the Poltyrev exponent,”in Proc. IEEE Int. Symp. Inform. Theory, Lausanne, Switzerland, June-July 2002, p.395.

[6] U. Erez and R. Zamir, “Achieving 12log(1 + SNR) on the AWGN channel with lattice

encoding and decoding,” IEEE Trans. Inform. Theory, vol. 50, pp. 2293-2314, October2004.

[7] G. Poltyrev, “On coding without restrictions for the AWGN channel,” IEEE Trans.Inform. Theory, vol. 40, pp. 409-417, March 1994.

[8] C. E. Shannon, “Probability of error for optimal codes in a Gaussian channel,” BellSyst. Tech. J., vol. 38, pp. 611-656, May 1959.

[9] R. G. Gallager, Information Theory and Reliable Communication, New York: Wiley,1968.

[10] G. D. Forney Jr., “On the role of MMSE estimation in approaching the information-theoretic limits of linear Gaussian channels: Shannon meets Wiener,” in Proc. 41thAnnal Allerton Conf. Comm., Contr., and Computing, Monticello, IL, October 2003,pp. 430-439.

[11] R. Zamir and M. Feder, “On lattice quantization noise,” IEEE Trans. Inform. Theory,vol. 42, pp. 1152-1159, July 1996.

[12] G. D. Forney Jr., M. D. Trott, and S. Y. Chung, “Sphere-bound-achieving coset codesand multilevel coset codes,” IEEE. Trans. Inform. Theory, vol. 46, pp. 820-850, May2000.

[13] U. Erez and R. Zamir, “Error exponents of modulo-additive noise channels with sideinformation at the transmitter,” IEEE Trans. Inform. Theory, vol. 47, pp. 210-218,January 2001.

[14] T. G. Thomas and B. Hughes, “Exponential error bounds for random codes on Gaussianarbitrarily varying channels,” IEEE Trans. Inform. Theory, vol. 37, pp. 643-649, May1991.

[15] U. Erez, S. Litsyn, and R. Zamir, ”Lattices which are good for (almost) everything,”IEEE Trans. Inform. Theory, vol. 51, pp. 3401-3416, October 2005.

[16] H. A. Loeliger, “Averaging bounds for lattice and linear codes,” IEEE Trans. Inform.Theory, vol. 43, pp. 1767-1773, November 1997.

35

Biography

Tie Liu received his B.S. (with honors) and M.S. degrees in 1998 and 2000 respectively,both from the Electrical Engineering department of Tsinghua University, Beijing, China.From 2000 to 2001 he was a graduate student of Massachusetts Institute of Technology,Cambridge, MA. Since 2001 he has been with University of Illinois at Urbana-Champaign,Urbana, IL where he obtained an M.S. degree in Mathematics in December 2004 and iscurrently pursuing a Ph.D. degree in Electrical Engineering. His research interests are in theareas of information theory, communication theory and systems, and signal processing.

Pierre Moulin received his doctoral degree in 1990, after which he worked for Bell Commu-nications Research for 5 years. He joined the University of Illinois at Urbana-Champaign in1996. He is currently Professor in the Department of Electrical and Computer Engineeringand Affiliate Professor in the Department of Statistics. His fields of professional interest areinformation theory, image and video processing, statistical signal processing, compression,and information hiding. He has served on the editorial boards of the IEEE Transactions onInformation Theory and the Transactions on Image Processing and is the editor in chief ofthe upcoming IEEE Transactions on Information Forensics and Security. He is an IEEE fel-low, recipient of 1997 and 2002 best paper awards from the IEEE Signal Processing Society,2003 Associate of the UIUC Center for Advanced Study, and 2005 Sony Faculty Scholar.

Ralf Koetter (S ’91, M ’96) received a Diploma in Electrical Engineering from the Tech-nical University Darmstadt, Germany in 1990 and a Ph.D. degree from the Department ofElectrical Engineering at Linkoping University, Sweden. From 1996 to 1998, he was a visit-ing scientist at the IBM Almaden Research Lab., San Jose, California. Dr. Koetter was aVisiting Assistant Professor at the University of Illinois at Urbana-Champaign and VisitingScientist at CNRS in Sophia Antipolis, France. He joined the faculty of the University ofIllinois at Urbana-Champaign in 1999 and is currently an Associate Professor at the Coordi-nated Science Laboratory at the University. Dr. Koetter’s research interests include codingand information theory and their applications to communication systems.

In the years 1999-2001, he served as Associate Editor of coding theory & techniques forthe IEEE Transactions on Communications. In 2000, he started a term as AssociateEditor of coding theory for the IEEE Transactions on Information Theory. Hereceived an IBM Invention Achievement Award in 1997, an NSF CAREER Award in 2000,and an IBM Partnership Award in 2001. He is a member of the Board of Governers ofthe IEEE Information Theory Society. In 2004, he received the IEEE Information TheorySociety Paper Award.

36

Documents

On Error Exponents of Modulo Lattice Additive Noise Channelsmoulin/Papers/LiuMK05.pdf1 Introduction Consider Costa’s dirty-paper channel [1] Y = X+S+N (1) where the channel input