Exact Asymptotics in Channel Coding

Exact Asymptotics in Channel Coding

Aaron Wagner(Joint work with Yücel Altu!)

I.I.D. Sums • Let Xi be i.i.d. with zero mean and unit var.

2

Asymptotic behavior ofN

=1X?


2


2

! > 0

• Weak Law of Large Numbers:

limN→0

Pr

N

=1X > εN

= 0


2

! > 0

• Large Deviations:

limN→0−1

NlogPr

N

=1X > εN

= Λ∗(ε)


limN→0

Pr

N

=1X > εN

= 0


2

! > 0

• Large Deviations:

limN→0−1

NlogPr

N

=1X > εN

= Λ∗(ε)

• Central Limit Theorem:

limN→0

Pr

N

=1X > εN

= Q(ε)


limN→0

Pr

N

=1X > εN

= 0

I.I.D. Sums

3

I.I.D. Sums

3

• Exact Asymptotics [Bahadur-Rao ’60]:

limN→0

PrN

=1 X > Nε

cN2−NΛ∗(ε)

= 1

I.I.D. Sums

3

• Moderate Deviations: if " is in (1/2, 1):

limN→0

1

N2β−1 logPr

N

=1X > εNβ

= Λ∗N (ε)


limN→0

PrN

=1 X > Nε

cN2−NΛ∗(ε)

= 1

Channel Coding

W

Discrete memoryless channel: W(y|x)

4

∈ Xy ∈ Y

Channel Coding

W


XN YN

4

∈ Xy ∈ Y

Channel Coding

W


XN YNEncoder0,1NR

4

∈ Xy ∈ Y

Channel Coding

W


XN YNEncoder0,1NR Decoder 0,1NR

4

∈ Xy ∈ Y

Channel Coding

W



Parameters:

4

∈ Xy ∈ Y

Channel Coding

W



Parameters:

4

• N: blocklength = # channel uses

∈ Xy ∈ Y

Channel Coding

W



Parameters:

4

• N: blocklength = # channel uses• R: data rate (bits per channel use)

∈ Xy ∈ Y

Channel Coding

W



Parameters:

4

• N: blocklength = # channel uses• R: data rate (bits per channel use)• Pe: error probability

∈ Xy ∈ Y

Channel Coding

W



Parameters:

4

• N: blocklength = # channel uses• R: data rate (bits per channel use)• Pe: error probability

∈ Xy ∈ Y

Pe(N,R) :=minPe : code is (N,R)

Channel Coding Theorem [Shannon ’48]

Channel capacity:

CR

5

C :=mxPX

I(PX;W)


Channel capacity:

CR

5

C :=mxPX

I(PX;W)

limN→∞

Pe(N,R) = 0


Channel capacity:

CR

5

C :=mxPX

I(PX;W)

limN→∞

Pe(N,R) = 0 limN→∞

Pe(N,R) = 1


Channel capacity:

C

Speed of convergence?

R

5

C :=mxPX

I(PX;W)

limN→∞

Pe(N,R) = 0 limN→∞

Pe(N,R) = 1

Error ExponentsPe(N,R) decays exponentially with N

6


6

RRcr CR∞

ESP(R)

Er(R)

(upper bound)

(lower bound)

E(R) := limspN→∞

−1

NlogPe(N,R)


6

RRcr CR∞

ESP(R)

Er(R)

(upper bound)

(lower bound)


−1

NlogPe(N,R)

Rates ofinterest


6

RRcr CR∞

ESP(R)

Er(R)

(upper bound)

(lower bound)


−1

NlogPe(N,R)

What is the sub-exponential pre-factor?

Rates ofinterest


6

RRcr CR∞

ESP(R)

Er(R)

(upper bound)

(lower bound)


−1

NlogPe(N,R)

What is the sub-exponential pre-factor?

Rates ofinterest

Focus on rates above Rcr

Pre-factors From the Literature

7

Upper bounds

Lower bounds


7

Fano ’61: 2 · 2−NE(R)

Upper bounds

Lower bounds


7

Fano ’61: 2 · 2−NE(R)

Gallager ’65: 2−NE(R)

Upper bounds

Lower bounds


7

[Universal] Csiszár et al. ’77: O(N2|X ||Y |)2−NE(R)

Fano ’61: 2 · 2−NE(R)


Upper bounds

Lower bounds


7


Fano ’61: 2 · 2−NE(R)


Shannon et al. ’67: Ω(2−N)2−NE(R)

Upper bounds

Lower bounds


7


Fano ’61: 2 · 2−NE(R)



Haroutunian ’68: Ω(N−|X ||Y |)2−NE(R)

Upper bounds

Lower bounds


7


Fano ’61: 2 · 2−NE(R)



Valembois-Fossorier ’04: ?Ω(2−N)2−NE(R)

Haroutunian ’68: Ω(N−|X ||Y |)2−NE(R)

Upper bounds

Lower bounds

Initial Guess

8

Initial Guess

8


limN→0

PrN

=1 X > Nε

cN2−NΛ∗(ε)

= 1

Elias ’55• Binary Erasure Channel (BEC):

q

1− q

1− q

q

9

0

1 1

?

0


q

1− q

1− q

q

9

0

1 1

?

0

Pe(N,R) = ΘN−0.52−NE(R)


q

1− q

1− q

q

9

0

1 1

?

0

• Binary Symmetric Channel (BSC): 1− p

1− p

p

p

0

1

0

1

Pe(N,R) = ΘN−0.52−NE(R)


q

1− q

1− q

q

9

0

1 1

?

0


1− p

p

p

0

1

0

1

Pe(N,R) = ΘN−0.52−NE(R)

Pe(N,R) = ΘN−0.5(1+|E

(R)|)2−NE(R)

Elias ’55

Is it a “zoo?”

• Binary Erasure Channel (BEC):

q

1− q

1− q

q

9

0

1 1

?

0


1− p

p

p

0

1

0

1

Pe(N,R) = ΘN−0.52−NE(R)

Pe(N,R) = ΘN−0.5(1+|E

(R)|)2−NE(R)

Theorem [Altu!-Wagner ’12]:

Symmetric Channels

10


Symmetric Channels

10

• If the channel is regular, then

Pe(N,R) = ΘN−0.5(1+|E

(R)|)2−NE(R)


Symmetric Channels

10


Pe(N,R) = ΘN−0.5(1+|E

(R)|)2−NE(R)

• If the channel is irregular, then Pe(N,R) = ΘN−0.52−NE(R)

Theorem [Altu!-Wagner ’12]: For any ! > 0,

General Channels

11


General Channels

11


Pe(N,R) ≤K1(R,ε)

N0.5(1+|E(R−)|−ε) 2−NE(R)


General Channels

11


Pe(N,R) ≤K1(R,ε)

N0.5(1+|E(R−)|−ε) 2−NE(R)

• If the channel is irregular, then

Pe(N,R) ≤K2(R)

N0.5 2−NE(R)

Theorem [Altu!-Wagner ’12]: For any (N,R) constant composition code and any ! > 0,

General Channels

12

Pe ≥K3(R,ε)

N0.5(1+|E(R−)|+ε) 2−NE(R)

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

13

Pre-factor Bounds Symmetric Asymmetric

Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)

Red: constant composition codes only

O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

13


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)


What is symmetric?

O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

13


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)


What is regular?

O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

13


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)


Why this?

O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

13


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)


O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Symmetric Channels

14

Definition (Gallager): A channel (stochastic matrix) W is symmetric if its columns can be partitioned so that, within each partition, the columns are permutations of each other, as are the rows.

Symmetric Channels

14

Definition (Gallager): A channel (stochastic matrix) W is symmetric if its columns can be partitioned so that, within each partition, the columns are permutations of each other, as are the rows.1− ε 0 ε0 1− ε ε

Symmetric Channels

14


Symmetric Channels

14


Symmetric

Symmetric Channels

14


Symmetric

3/4 1/41/3 2/3

Symmetric Channels

14


Symmetric

3/4 1/41/3 2/3

Not symmetric

Regular Channels

Definition: A symmetric channel W is regular if for some x, y, and z:

15

W(y|) > 0

W(y|z) > 0

W(y|) =W(y|z)x

y

z

Regular Channels

16

Regular

1− p

1− p

p

p

Regular Channels

16

Regular

1− p

1− p

p

p

1−

1−

Irregular

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

17


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)


O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

17


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)


Why this?

O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

17


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)


O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Why the Slope Thingy?

18

Binary Symmetric Channel (BSC): 1− p

1− p

p

p

0

1

0

1

Why the Slope Thingy?

18

Binary Symmetric Channel (BSC): 1− p

1− p

p

p

0

1

0

1Pe(N,R) = ΘN−0.5(1+|E

(R)|)2−NE(R)

Packing Spheres• Binary symmetric channel

0, 1N

Codewords

...

. .

.

. . . .

. . .

. . . . .

19


0, 1N

...

. .

.

. . . .

. . .

. . . . .U1

U3

U6

Um

rR,NrR,N

rR,N

rR,N rR,N

rR,N

rR,N rR,N rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

Hamming spheres of radius rR,N

19


0, 1N

...

. .

.

. . . .

. . .

. . . . .U1

U3

U6

Um

rR,NrR,N

rR,N

rR,N rR,N

rR,N

rR,N rR,N rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

rR,N

Hamming spheres of radius rR,N

19

i.i.d. channel bit flips

Pe(N,R) ≈ Pr

N

=1T > rR,N

≈cN2−N·E(R)?

How to Compute rR,N?Vol. of a Hamming sphere with radius rR,N?

20


20

1N2Nh rR,N

N


20

1N2Nh rR,N

N

h(θ) = −θ logθ− (1− θ) log(1− θ)


20

1N2Nh rR,N

N

Vol. of each decoding region?

h(θ) = −θ logθ− (1− θ) log(1− θ)


20

1N2Nh rR,N

N

Vol. of each decoding region? 2N

2NR

h(θ) = −θ logθ− (1− θ) log(1− θ)


20

1N2Nh rR,N

N


2NR

h(θ) = −θ logθ− (1− θ) log(1− θ)

So rR,N ≈ N · h−11− R+

1

NlogN


20

1N2Nh rR,N

N


2NR

h(θ) = −θ logθ− (1− θ) log(1− θ)

Pe(N,R) ≈ Pr

N

=1T > rR,N

≈

1N2−N·E(R−(log

N)/N)

So rR,N ≈ N · h−11− R+

1

NlogN


20

1N2Nh rR,N

N


2NR

h(θ) = −θ logθ− (1− θ) log(1− θ)

Pe(N,R) ≈ Pr

N

=1T > rR,N

≈

1N2−N·E(R−(log

N)/N)

So rR,N ≈ N · h−11− R+

1

NlogN

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

21


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)


O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Proof Technique

22

Proof Technique

22

• Overall approach:

Proof Technique

22


• Mimic existing proofs for exponents ...

Proof Technique

22



• ... but use exact asymptotics instead of large deviations

Proof Technique

22




• Upper bound:

• Csiszar et al. ‘77

• Fano ‘61

• Gallager ‘65

Proof Technique

22




• Upper bound:


• Fano ‘61

• Gallager ‘65

Proof Technique

22




• Lower bound:

• Shannon et al. ‘67

• Haroutunian ’68

• Blahut ‘74

• Upper bound:


• Fano ‘61

• Gallager ‘65

Proof Technique

22




• Lower bound:

• Shannon et al. ‘67

• Haroutunian ’68

• Blahut ‘74

• Upper bound:


• Fano ‘61

• Gallager ‘65

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

23


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)


O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Summary of Results

23


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)


O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))What about constants/lower order behavior?

24

The venerable large-deviations information theory school seems to have made little headway

in convincing engineers to transmit at, say, ... half of capacity for the sake of astronomically low block

error probability in the asymptotic regime.

A Criticism of Error Exponents

Sergio Verdú (2007 Shannon lecture)24

The venerable large-deviations information theory school seems to have made little headway

in convincing engineers to transmit at, say, ... half of capacity for the sake of astronomically low block

error probability in the asymptotic regime.

A Criticism of Error Exponents

Normal Approximation

25

• R*(N,!) = max R such that Pe(N,R) " !


• Fix an 0 < ! < 0.5

25


• Theorem: [Strassen ’62, Hayashi ’09, Polyanskiy, Poor, and Verdú ’10]

R∗(N, ) = C −

V

NQ−1() + O

log N

N

As N→∞


• Fix an 0 < ! < 0.5

25



R∗(N, ) = C −

V

NQ−1() + O

log N

N

As N→∞


• Fix an 0 < ! < 0.5

Channel dispersion

25



R∗(N, ) = C −

V

NQ−1() + O

log N

N

As N→∞


• Fix an 0 < ! < 0.5

Analogous to CLT. Is it a better approach?

Channel dispersion

25


?

Error exponents

Normal approximation

Pe(N,R)

RC

1

O(1)

2−NE(R)

26

Fix some large N.

?

Error exponents

Normal approximation

Pe(N,R)

RC

1

O(1)

2−NE(R)

Moderate Deviations

26

Fix some large N.

CRate

Moderate Deviations

27

CRate

Moderate Deviations

27

C −O(1)

limN→∞− log Pe(R,N)N = E(R)

• Error exponents:

CRate

Moderate Deviations

27

C −O(1)



C −O

1√N

Pe(R,N) ≈

• Normal approximation:

CRate

Moderate Deviations

27

C −O(1)



C − N

• What is in between?

C −O

1√N

Pe(R,N) ≈

• Normal approximation:

Moderate Deviations

Theorem: [Altu!-Wagner ’10]

• Let RN = C - !N and assume

• Then

28

limN→∞

εN = 0 limN→∞

εNN =∞

limN→∞−logPe(N,RN)

ε2NN=

1

2V

Pe(N,RN) ≈ 2−ε

2N ·N/(2V)

Proof CommentsFirst idea: Take existing finite-N bounds, and simply apply the new asymptotic

29


- Achievability #

29


- Achievability #

- Converse $

29


- Achievability #

- Converse $

29

Pe(N,R) ≥K(R)

N0.5(1+|E(R)|) 2−NE(R)


- Achievability #

- Converse $

29

appears to blow up as R approaches C

Pe(N,R) ≥K(R)

N0.5(1+|E(R)|) 2−NE(R)


- Achievability #

- Converse $

29

appears to blow up as R approaches C

Pe(N,R) ≥K(R)

N0.5(1+|E(R)|) 2−NE(R)

Characterizing the lower-order factors would unify error exponents and moderate deviations

Analogies

1. WLLN

2. LDP

3. CLT

4. EA

5. MDP

1. Channel Coding Theorem

2. Error Exponents

3. Normal Approximation

I.I.D. Sums Channel Coding

4. EA in channel coding

5. MDP in channel coding

30

Analogies

1. WLLN

2. LDP

3. CLT

4. EA

5. MDP

1. Channel Coding Theorem

2. Error Exponents

3. Normal Approximation

I.I.D. Sums Channel Coding

4. EA in channel coding

5. MDP in channel coding

Too many results! Can we unify them?

30

Conclusion

31

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Pre-factor Bounds

Symmetric Asymmetric

Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)

O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Conclusion

• Normal approximation in channel coding

31

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Pre-factor Bounds


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)

O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Conclusion

• Normal approximation in channel coding• Moderate deviations in channel coding

31

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Pre-factor Bounds


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)

O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Conclusion

• Normal approximation in channel coding• Moderate deviations in channel coding• Lower-order results could unify regimes

31

O(N0.5(1+|E(R)|))

Ω(N0.5(1+|E(R)|))

Pre-factor Bounds


Regular

IrregularO(N0.5)

Ω(N0.5)

O(N0.5)

O(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

Ω(N0.5(1+|E(R−)|))

I.I.D. Sums

32

I.I.D. Sums

32


limN→0

PrN

=1 X > Nε

cN2−NΛ∗(ε)

= 1

I.I.D. Sums

32

• Moderate Deviations: if " is in (1/2, 1):

limN→0

1

N2β−1 logPr

N

=1X > εNβ

= Λ∗N (ε)


limN→0

PrN

=1 X > Nε

cN2−NΛ∗(ε)

= 1

Documents

Exact Asymptotics in Channel Coding