63
Lecture 6 Polar Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University [email protected] December 5, 2016 1 / 63 I-Hsiang Wang IT Lecture 6

Lecture 6 Polar Codinghomepage.ntu.edu.tw/~ihwang/Teaching/Fa16/Slides/IT...IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009 3051 Channel Polarization: A Method for

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Lecture 6Polar Coding

I-Hsiang Wang

Department of Electrical EngineeringNational Taiwan University

[email protected]

December 5, 2016

1 / 63 I-Hsiang Wang IT Lecture 6

In Pursuit of Shannon's Limit

Since 1948, Shannon's theory has drawn the sharp boundary between the possible and theimpossible in data compression and data transmission.

Once fundamental limits are characterized, the next natural question is:

How to achieve these limits with acceptable complexity?

For lossless source coding, it did not take us too long to find optimal schemes with low complexity:

Huffman Code (1952): optimal for memoryless sourceLempel-Ziv (1977): optimal for stationary ergodic source

On the other hand, for channel coding and lossy source coding, it turns out to be much harder. Ithas been the holy grail for coding theorist to find codes that achieve Shannon's limit with lowcomplexity.

2 / 63 I-Hsiang Wang IT Lecture 6

In Pursuit of Capacity-Achieving Codes

Two barriers in pursuing low-complexity capacity-achieving codes:1 Lack of explicit construction. In Shannon's proof, it is only proved that there exists coding

schemes that achieve capacity.2 Lack of structure to reduce complexity. In the proof of coding theorems, complexity issues

are often neglected, while codes with structures are hard to prove to achieve capacity.Since 90's, several practical codes were found to approach capacity – turbo code, low-densityparity-check (LDPC) code, etc. They perform well empirically, but lack rigorous proof of optimality.

The first provably capacity-achieving coding scheme with acceptable complexity is polar code,introduced by Erdal Arıkan in 2007.

Later in 2012, spatially coupled LDPC codes were also shown to achieve capacity (ShrinivasKudekar, Tom Richardson, and Rüediger Urbanke).

3 / 63 I-Hsiang Wang IT Lecture 6

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009 3051

Channel Polarization: A Method for ConstructingCapacity-Achieving Codes for Symmetric

Binary-Input Memoryless ChannelsErdal Arıkan, Senior Member, IEEE

Abstract—A method is proposed, called channel polarization,to construct code sequences that achieve the symmetric capacity

of any given binary-input discrete memoryless channel(B-DMC) . The symmetric capacity is the highest rate achiev-able subject to using the input letters of the channel with equalprobability. Channel polarization refers to the fact that it is pos-sible to synthesize, out of independent copies of a given B-DMC

, a second set of binary-input channelssuch that, as becomes large, the fraction of indices for which

is near approaches and the fraction for whichis near approaches . The polarized channelsare well-conditioned for channel coding: one need only

send data at rate through those with capacity near and at ratethrough the remaining. Codes constructed on the basis of this ideaare called polar codes. The paper proves that, given any B-DMC

with and any target rate , there exists asequence of polar codes such that has block-length

, rate , and probability of block error under suc-cessive cancellation decoding bounded asindependently of the code rate. This performance is achievable byencoders and decoders with complexity for each.

Index Terms—Capacity-achieving codes, channel capacity,channel polarization, Plotkin construction, polar codes, Reed–Muller (RM) codes, successive cancellation decoding.

I. INTRODUCTION AND OVERVIEW

A FASCINATING aspect of Shannon’s proof of the noisychannel coding theorem is the random-coding method

that he used to show the existence of capacity-achieving codesequences without exhibiting any specific such sequence [1].Explicit construction of provably capacity-achieving codesequences with low encoding and decoding complexities hassince then been an elusive goal. This paper is an attempt tomeet this goal for the class of binary-input discrete memorylesschannels (B-DMCs).

We will give a description of the main ideas and results of thepaper in this section. First, we give some definitions and statesome basic facts that are used throughout the paper.

Manuscript received October 14, 2007; revised August 13, 2008. Current ver-sion published June 24, 2009. This work was supported in part by The Scien-tific and Technological Research Council of Turkey (TÜBITAK) under Project107E216 and in part by the European Commission FP7 Network of ExcellenceNEWCOM++ under Contract 216715. The material in this paper was presentedin part at the IEEE International Symposium on Information Theory (ISIT),Toronto, ON, Canada, July 2008.

The author is with the Department of Electrical-Electronics Engineering,Bilkent University, Ankara, 06800, Turkey (e-mail: [email protected]).

Communicated by Y. Steinberg, Associate Editor for Shannon Theory.Color versions of Figures 4 and 7 in this paper are available online at http://

ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIT.2009.2021379

A. Preliminaries

We write to denote a generic B-DMC withinput alphabet , output alphabet , and transition probabilities

. The input alphabet will always be, the output alphabet and the transition probabilities may

be arbitrary. We write to denote the channel correspondingto uses of ; thus, with

.Given a B-DMC , there are two channel parameters of pri-

mary interest in this paper: the symmetric capacity

and the Bhattacharyya parameter

These parameters are used as measures of rate and reliability,respectively. is the highest rate at which reliable commu-nication is possible across using the inputs of with equalfrequency. is an upper bound on the probability of max-imum-likelihood (ML) decision error when is used only onceto transmit a or .

It is easy to see that takes values in . Throughout,we will use base- logarithms; hence, will also takevalues in . The unit for code rates and channel capacitieswill be bits.

Intuitively, one would expect that iff ,and iff . The following bounds, proved inthe Appendix, make this precise.

Proposition 1: For any B-DMC , we have

(1)

(2)

The symmetric capacity equals the Shannon capacitywhen is a symmetric channel, i.e., a channel for which thereexists a permutation of the output alphabet such that i)

and ii) for all . The bi-nary symmetric channel (BSC) and the binary erasure channel(BEC) are examples of symmetric channels. A BSC is a B-DMC

with and. A B-DMC is called a BEC if for each , either

or . In the latter case,

0018-9448/$25.00 © 2009 IEEE

The paper wins the 2010 Information Theory Society Best Paper Award.

4 / 63 I-Hsiang Wang IT Lecture 6

Overview

When Arıkan introduced polar codes in 2007, he focus on achieving capacity for the generalbinary-input memoryless symmetric channels (BMSC), including BSC, BEC, etc.

Later, polar codes are shown to be optimal in many other settings, including lossy source coding,non-binary-input channels, multiple access channels, channel coding with encoder side information(Gelfand-Pinsker), source coding with side information (Wyner-Ziv), etc.

Instead of giving a comprehensive introduction, we shall focus on polar coding for channel coding.The outline is as follows:

1 First we introduce the concept of channel polarization.

2 Second we explore polar coding for binary input channels.

3 Finally we briefly talk about polar coding for source coding (source polarization).

5 / 63 I-Hsiang Wang IT Lecture 6

Notations

In channel coding, we use the DMC N times where N is the blocklength of the coding scheme.

Since the channel is the main focus, we use the following notations throughout this lecture:

W to denote the channel PY |X

P to denote the input distribution PX

I (P,W ) to denote I (X ;Y ).

Since we focus on BMSC, and X ∼ Ber(12

)achieves the channel capacity of any BMSC, we shall

use I (W ) (slight abuse of notation) to denote I (P,W ) when the input P is Ber(12

).

In other words, the channel capacity of a BMSCW is I (W ).

6 / 63 I-Hsiang Wang IT Lecture 6

Polarization

1 PolarizationBasic Channel TransformationChannel Polarization

2 Polar CodingEncoding and Decoding ArchitecturesPerformance Analysis

7 / 63 I-Hsiang Wang IT Lecture 6

Polarization

Single Usage of ChannelW

X YW

N Usage of ChannelW

...

ENC DEC

W

W

W

M M

X1

X2

XN

Y1

Y2

YN

8 / 63 I-Hsiang Wang IT Lecture 6

Polarization

Arıkan's Idea

...

Pre-Processing

W

W

W

X1

X2

XN

Y1

Y2

YNUN

U2

U1

Post-Processing

V1

V2

VN

Apply special transforms to both input and output

9 / 63 I-Hsiang Wang IT Lecture 6

Polarization

Arıkan's Idea

W1

...

W2

WNUN

U2

U1 V1

V2

VN

10 / 63 I-Hsiang Wang IT Lecture 6

Polarization

Arıkan's Idea

W1

...

W2

WNUN

U2

U1 V1

V2

VN

Roughly N I (W ) channels with capacity � 1

11 / 63 I-Hsiang Wang IT Lecture 6

Polarization

Arıkan's Idea

W1

...

W2

WNUN

U2

U1 V1

V2

VN

Roughly N I (W ) channels with capacity � 1

Roughly N (1 � I (W )) channels with capacity � 0

Equivalently some perfect channels and some useless channels −→ Polarization

Coding becomes extremely simple: simply use those perfect channels for uncoded transmission,and throw those useless channels away.

12 / 63 I-Hsiang Wang IT Lecture 6

Polarization Basic Channel Transformation

1 PolarizationBasic Channel TransformationChannel Polarization

2 Polar CodingEncoding and Decoding ArchitecturesPerformance Analysis

13 / 63 I-Hsiang Wang IT Lecture 6

Polarization Basic Channel Transformation

Arıkan's Basic Channel Transformation

Consider two channel uses ofW:

14 / 63 I-Hsiang Wang IT Lecture 6

X1

X2

W

W

Y1

Y2

Polarization Basic Channel Transformation

Arıkan's Basic Channel Transformation

Consider two channel uses ofW:

Apply the pre-processor: X1 = U1 ⊕ U2, X2 = U2,where U1 ⊥⊥ U2, U1, U2 ∼ Ber

(12

).

We now have two synthetic channels induced by the above procedure:

W− : U1 → V1 ≜ (Y1, Y2)

W+ : U2 → V2 ≜ (Y1, Y2, U1)

The above transform yields the following two crucial phenomenon:

I (W− ) ≤ I (W ) ≤ I (W+ ) (Polarization)

I (W− ) + I (W+ ) = 2I (W ) (Conservation of Information)

15 / 63 I-Hsiang Wang IT Lecture 6

W

W

Y1

Y2U2

U1

Polarization Basic Channel Transformation

Example: Binary Erasure Channel

Example 1

LetW be a BEC with erasure probability ε ∈ (0, 1), and I (W ) = 1− ε. Find the values of I (W− )and I (W+ ), and verify the above properties.

sol: IntuitivelyW− is worse thanW andW+ is better thanW:

ForW−, input is U1, output is (Y1, Y2): Only when both Y1 and Y2 are not erased, one can figure outU1! =⇒ W− is BEC with erasure probability 1− (1− ε)

2= 2ε− ε2.

ForW+, input is U2, output is (Y1, Y2, U1): As long as one of Y1 and Y2 are not erased, one can figureout U2! =⇒ W+ is BEC with erasure probability ε2.

Hence, I (W− ) = 1− 2ε+ ε2 and I (W+ ) = 1− ε2.

16 / 63 I-Hsiang Wang IT Lecture 6

Polarization Basic Channel Transformation

Example: Binary Symmetric Channel

Example 2

LetW be a BSC with crossover probability p ∈ (0, 1), and I (W ) = 1− Hb (p). Find the values ofI (W− ) and I (W+ ).

17 / 63 I-Hsiang Wang IT Lecture 6

Polarization Basic Channel Transformation

Basic Properties

Theorem 1For any BMSCW and the induced channels {W−,W+} from Arıkan's basic transformation, we have

I (W− ) ≤ I (W ) ≤ I (W+ ) with equality iff I (W ) = 0 or 1.

I (W− ) + I (W+ ) = 2I (W )

pf: We prove the conservation of information first:

I(W− )

+ I(W+

)= I (U1 ;Y1, Y2 ) + I (U2 ;Y1, Y2, U1 ) = I (U1 ;Y1, Y2 ) + I (U2 ;Y1, Y2 |U1 )

= I (U1, U2 ;Y1, Y2 ) = I (X1, X2 ;Y1, Y2 ) = I (X1 ;Y1 ) + I (X2 ;Y2 ) = 2I (W ) .

I (W+ ) = I (X2 ;Y1, Y2, U1 ) ≥ I (X2 ;Y2 ) = I (W ), and hence the first property holds.(Proof of the condition for equality is left as exercise.)

18 / 63 I-Hsiang Wang IT Lecture 6

Polarization Basic Channel Transformation

Extremal Channels

12.1. The Basic Channel Transformation 281

The clue is now to realize that we have equality in (12.27) if, and onlyif, Y1 is conditionally independent of X1 given U1 and Y2. This can happenonly in exactly two cases: Either W is useless, i.e., Y1 is independent of X1

and any other quantity related with X1 such that all conditioning disappearsand we have H(Y1)�H(Y1) in (12.25) (this corresponds to the situation whenI(W) = 0). Or W is perfect so that from Y2 we can perfectly recover U2 and— with the additional help of U1 — also X1 (this corresponds to the situationwhen I(W) = 1 bit).

It can be shown that a BEC yields the largest di↵erence between I(W+)and I(W�) and the BSC yields the smallest di↵erence. Any other DMC willyield something in between. See Figure 12.4 for the corresponding plot.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

I(W

+)−I(W

)[bits]

I(W) [bits]

BEC

BSC

Figure 12.4: Di↵erence of I(W+) and I(W�) as a function of I(W). We see thatunless the channel is extreme already, the di↵erence is strictlypositive. Moreover, the largest di↵erence is achieved for a BEC,while the BSC yields the smallest di↵erence.

Exercise 12.6. In this exercise you are asked to recreate the boundary curvesin Figure 12.4.

1. Start with W being a BEC with erasure probability �. Recall that I(W) =1� � bits, and then show that I(W+)� I(W�) = 2�(1� �) bits.

2. For W being a BSC with crossover probability ✏, recall that I(W) =1�Hb(✏) bits. Then, defining Z1 and Z2 being independent binary RVs

c� Copyright Stefan M. Moser, version 4.4, 31 Aug. 2015

(Taken from Chap. 12.1 of Moser[4].)

If we plot the "information stretch" I (W+ )− I (W− )vs. the original I (W ), it turns out among all BMSC:

BEC maximizes the stretch

BSC minimizes the stretch

Lower boundary: 2Hb (2p(1− p))− 2Hb (p) ,where p = Hb−1 (1− I (W )).

Upper boundary: 2I (W ) (1− I (W )) .

19 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

1 PolarizationBasic Channel TransformationChannel Polarization

2 Polar CodingEncoding and Decoding ArchitecturesPerformance Analysis

20 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

DuplicateW, apply the transformation, and getW− andW+.

21 / 63 I-Hsiang Wang IT Lecture 6

W

W

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

DuplicateW, apply the transformation, and getW− andW+.

DuplicateW− (andW+).

22 / 63 I-Hsiang Wang IT Lecture 6

W

W

W

W

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

DuplicateW, apply the transformation, and getW− andW+.

DuplicateW− (andW+).

Apply the transformation onW−, and getW−− andW−+.

23 / 63 I-Hsiang Wang IT Lecture 6

W

W

W

W

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

DuplicateW, apply the transformation, and getW− andW+.

DuplicateW− (andW+).

Apply the transformation onW−, and getW−− andW−+.

Apply the transformation onW+, and getW+− andW++.

24 / 63 I-Hsiang Wang IT Lecture 6

W

W

W

W

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

DuplicateW, apply the transformation, and getW− andW+.

DuplicateW− (andW+).

Apply the transformation onW−, and getW−− andW−+.

Apply the transformation onW+, and getW+− andW++.

...

We can keep going and going, until the desired blocklengthis reached.

25 / 63 I-Hsiang Wang IT Lecture 6

W

W

W

W

W

W

W

W

Polarization Channel Polarization

Polarized Channels after Recursive Application

After one recursion and gettingW− andW+, let us duplicate them.

26 / 63 I-Hsiang Wang IT Lecture 6

W

W

W

W

Y1

Y2

Y3

Y4

Polarization Channel Polarization

Polarized Channels after Recursive Application

Apply the transformation onW−:

W−− : U1 → ((Y1, Y2) , (Y3, Y4)) = Y 4

W−+ : U2 → ((Y1, Y2) , (Y3, Y4) , U1) =(Y 4, U1

)27 / 63 I-Hsiang Wang IT Lecture 6

W

W

W

W

U1

U2

Y1

Y2

Y3

Y4

Polarization Channel Polarization

Polarized Channels after Recursive Application

Apply the transformation onW+:

W+− : U3 → ((Y1, Y2, U1 ⊕ U2) , (Y3, Y4, U2)) =(Y 4, U2

)W++ : U4 → ((Y1, Y2, U1 ⊕ U2) , (Y3, Y4, U2) , U3) =

(Y 4, U3

)28 / 63 I-Hsiang Wang IT Lecture 6

W

W

W

W

U3

U4

Y1

Y2

Y3

Y4

U1 � U2

U2

Polarization Channel Polarization

Polarized Channels after Recursive Application

Putting things together, we have:

W−− : U1 →(Y 4, ∅

)W−+ : U2 →

(Y 4, U1

)W+− : U3 →

(Y 4, U2

)W++ : U4 →

(Y 4, U3

)29 / 63 I-Hsiang Wang IT Lecture 6

W

W

W

W

U1

U3

U2

U4

Y1

Y2

Y3

Y4

Polarization Channel Polarization

Recursive Application of Arıkan's Transformation

30 / 63 I-Hsiang Wang IT Lecture 6

With proper naming of the inputs (do it yourself), ℓ-timesrecursion generates a system with N = 2ℓ channel uses.

The N polarized channels areWs1,...,sℓ , sj ∈ {+,−},where

Ws1,...,sℓ : Ui →(Y N , U i−1

).

If we set − ↔ 0 and + ↔ 1, then the index i of channelWs1,...,sℓ is i = 1 +

∑ℓj=1 sj2

ℓ−j , one plus the numberwith binary representation of (s1, . . . , sℓ) (MSB → LSB).

In the following we useW(i)N to denote theWs1,...,sℓ

generated above, for i = 1, . . . , N , N = 2ℓ.

W

W

W

W

W

W

W

W

Polarization Channel Polarization

Channel Polarization

Theorem 2 (Channel Polarization)

For any BMSCW, the polarized channels{W(i)

N

∣∣∣ i = 1, . . . , N}

(N = 2ℓ) satisfy the following:

For all a, b such that 0 < a < b < 1,

limN→∞

1N

∣∣∣{i : I(W(i)N

)∈ [0, a)

}∣∣∣ = 1− I (W ) and

limN→∞

1N

∣∣∣{i : I(W(i)N

)∈ (b, 1]

}∣∣∣ = I (W )

limN→∞

1N

∣∣∣{i : I(W(i)N

)∈ [a, b]

}∣∣∣ = 0

Interpretation: When N is sufficiently large, roughly N I (W ) of them are noiseless (capacity = 1),and N (1− I (W )) of them are useless (capacity = 0).

31 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

0 10

1

1N

!!!"i : I

#W(i)

N

$≤ ε

%!!!

N = 20

32 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

0 10

1

1N

!!!"i : I

#W(i)

N

$≤ ε

%!!!

N = 21

33 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

0 10

1

1N

!!!"i : I

#W(i)

N

$≤ ε

%!!!

N = 22

34 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

0 10

1

1N

!!!"i : I

#W(i)

N

$≤ ε

%!!!

N = 24

35 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

0 10

1

1N

!!!"i : I

#W(i)

N

$≤ ε

%!!!

N = 28

36 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

0 10

1

1N

!!!"i : I

#W(i)

N

$≤ ε

%!!!

N = 212

37 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

0 10

1

1N

!!!"i : I

#W(i)

N

$≤ ε

%!!!

N = 220

38 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

0 10

1

1N

!!!"i : I

#W(i)

N

$≤ ε

%!!!

N = ∞

39 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

Proof of Channel Polarization

pf: Define the averaged first and second moment of{W(i)

2ℓ

∣∣∣ i = 1, . . . , 2ℓ}as follows: (2ℓ = N )

µℓ ≜ 12ℓ

∑2ℓ

i=1 I(W(i)

2ℓ

), νℓ ≜ 1

2ℓ

∑2ℓ

i=1

(I(W(i)

2ℓ

))2

Due to the conservation of information of Arıkan's transformaion (Theorem 1), µℓ = I (W ) for all ℓ.

As for the averaged second moment, note that

12

((I (W+ ))

2+ (I (W− ))

2)=

(12 (I (W+ ) + I (W− ))

)2+

(12 (I (W+ )− I (W− ))

)2= I (W )2 +

(12 (I (W+ )− I (W− ))

)2= I (W )2 +∆(W)2 ,

where ∆(W) ≜ 12 (I (W+ )− I (W− )).

40 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

νℓ+1 =12ℓ

2ℓ∑i=1

I(W(i)

2ℓ

)2+∆

(W(i)

2ℓ

)2

≥ νℓ + κ (a, b)2 θℓ (a, b) , (1)

where

κ(a, b) ≜ min {∆(WBSCa) ,∆(WBSCb) , }

θℓ(a, b) ≜ 12ℓ

∣∣∣{i : I(W(i)

2ℓ

)∈ [a, b]

}∣∣∣ .Hence, {νℓ} form a non-decreasing sequence.

Meanwhile, since all channels are binary-input,I(W(i)

2ℓ

)≤ 1, and therefore νℓ ≤ 1.

12.1. The Basic Channel Transformation 281

The clue is now to realize that we have equality in (12.27) if, and onlyif, Y1 is conditionally independent of X1 given U1 and Y2. This can happenonly in exactly two cases: Either W is useless, i.e., Y1 is independent of X1

and any other quantity related with X1 such that all conditioning disappearsand we have H(Y1)�H(Y1) in (12.25) (this corresponds to the situation whenI(W) = 0). Or W is perfect so that from Y2 we can perfectly recover U2 and— with the additional help of U1 — also X1 (this corresponds to the situationwhen I(W) = 1 bit).

It can be shown that a BEC yields the largest di↵erence between I(W+)and I(W�) and the BSC yields the smallest di↵erence. Any other DMC willyield something in between. See Figure 12.4 for the corresponding plot.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

I(W

+)−I(W

)[bits]

I(W) [bits]

BEC

BSC

Figure 12.4: Di↵erence of I(W+) and I(W�) as a function of I(W). We see thatunless the channel is extreme already, the di↵erence is strictlypositive. Moreover, the largest di↵erence is achieved for a BEC,while the BSC yields the smallest di↵erence.

Exercise 12.6. In this exercise you are asked to recreate the boundary curvesin Figure 12.4.

1. Start with W being a BEC with erasure probability �. Recall that I(W) =1� � bits, and then show that I(W+)� I(W�) = 2�(1� �) bits.

2. For W being a BSC with crossover probability ✏, recall that I(W) =1�Hb(✏) bits. Then, defining Z1 and Z2 being independent binary RVs

c� Copyright Stefan M. Moser, version 4.4, 31 Aug. 2015

(Modified from Chap. 12.1 of Moser[4].)

41 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

Hence, ν0 ≤ ν1 ≤ . . . ≤ νℓ ≤ . . . ≤ 1 =⇒ limℓ→∞ νℓ exists.

By (1), we have

θℓ(a, b) ≤νℓ+1 − νℓκ(a, b)2

=⇒ limℓ→∞

θℓ(a, b) = 0. (since limℓ→∞ νℓ exists.)

Finally, define αℓ(a) ≜ 12ℓ

∣∣∣{i : I(W(i)

2ℓ

)∈ [0, a)

}∣∣∣ and βℓ(b) ≜ 12ℓ

∣∣∣{i : I(W(i)

2ℓ

)∈ (b, 1]

}∣∣∣.Observe that

I (W ) = µℓ ≤ a · αℓ(a) + b · θℓ(a, b) + 1 · βℓ(b) = a+ (b− a)θℓ(a, b) + (1− a)βℓ(b)

1− I (W ) = 1− µℓ ≤ 1− 0 · αℓ(a)− a · θℓ(a, b)− b · βℓ(b) = (1− b) + (b− a)θℓ(a, b) + bαℓ(a)

It is then not hard to show that lim infℓ→∞ βℓ(b) ≥ I (W ) and lim infℓ→∞ αℓ(b) ≥ 1− I (W ).

Proof is complete by sandwich principle.

42 / 63 I-Hsiang Wang IT Lecture 6

Polarization Channel Polarization

From Channel Polarization to Polar CodingRecall the original goal:

...

Pre-Processing

W

W

W

X1

X2

XN

Y1

Y2

YNUN

U2

U1

Post-Processing

V1

V2

VN

Caveat:What we have done, however, it thefollowing: we created N = 2ℓ

polarized channels

W(i)N : Ui → Vi =

(Y N , U i−1

).

However, we cannot obtain the trueU i−1 from the channel output Y N .

This issue can be fixed by successive decoding, where Vi ≜ (Y N , U i−1) instead of (Y N , U i−1).

Encoding is based on those "synthetic" polarized channels {W(i)N }, and the i-th synthetic channel is

a good approximation as long as U i−1 = U i−1 with high probability.

43 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding

1 PolarizationBasic Channel TransformationChannel Polarization

2 Polar CodingEncoding and Decoding ArchitecturesPerformance Analysis

44 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Encoding and Decoding Architectures

1 PolarizationBasic Channel TransformationChannel Polarization

2 Polar CodingEncoding and Decoding ArchitecturesPerformance Analysis

45 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Encoding and Decoding Architectures

Overview of Polar Coding Architecture

1 PreparationGenerate the N = 2ℓ synthetic polarized channels {W(i)

N | i = 1, 2, . . . , N}.

2 EncodingTo encode K information bits into an N -bit codeword, the encoder picks a subset A ⊆ [1 : N ]of synthetic polarized channels from the N channels above, based on the qualities of them:

For each i ∈ A, use Ui to sent an information bit.For each i ∈ F ≜ Ac, fix Ui to a dummy bit u∗

i (frozen bits).

3 Decoding is based on successive cancellation, wherethe decoded Ui is determined by (Y N , U i−1) if i ∈ A.the decoded Ui = u∗

i , the pre-fixed dummy frozen bit, if i ∈ F .

46 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Encoding and Decoding Architectures

Synthetic Polarized ChannelsFor the i-th synthesized channel, its input is Ui, output is

(Y N , U i−1

), and the

channel law is

W(i)N

(yN , ui−1

∣∣ui

)= 1

2N−1

1∑ui+1=0

...1∑

uN=0P(yN

∣∣uN),

where P(yN

∣∣uN)=

∏Ni=1W (yi|xi). The relationship between xN and uN is

described in the next slide.

Recursive Relation of Channel Laws

W(2k−1)2N

(y2N , u2k−2

∣∣u2k−1

)=

∑u2k=0,1

12W(k)

N

(y1:N , u2k−2

odd ⊕ u2k−2even

∣∣u2k−1 ⊕ u2k

)W(k)

N

(yN+1:2N , u2k−2

even∣∣u2k

)W(2k)

2N

(y2N , u2k−1

∣∣u2k

)= 1

2W(k)

N

(y1:N , u2k−2

odd ⊕ u2k−2even

∣∣u2k−1 ⊕ u2k

)W(k)

N

(yN+1:2N , u2k−2

even∣∣u2k

)

47 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Encoding and Decoding Architectures

Relation between UN and XN

As mentioned before, with a proper "bit-reversal permutation" of indices of Ui's, one can obtain Ui's,where the relationship betweenXN and UN can be characterized by the ℓ-times Kronecker product:

XN = UN · GN ,

where GN = G⊗ℓ2 ≜ G2⊗ . . .⊗︸ ︷︷ ︸

ℓ times

G2, N = 2ℓ, and G2 =

[1 01 1

]. Easy to check: (GN )−1 = GN .

The "bit-reversal permutation" σbr is described as follows: for i = 1 +∑ℓ

j=1 sj2ℓ−j ,

σbr(i) = 1 +∑ℓ

j=1 sj2j−1.

In other words, the binary representation of σbr(i)− 1 is the reverse of that of i− 1, and vice versa.We shall use RN to denote the matrix representation of σbr. Easy to check: (RN )−1 = RN . Hence,

XN = UN · RNGN and UN = XN · GNRN .

48 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Encoding and Decoding Architectures

Encoding

Two things to be specified for polar encoding:

1 Determine the active set A and the frozen set F .

2 Determine what to send on the indices of the frozen set.

Selection of the Frozen SetLetK denote the number of information bits to be delivered. Then, in principle, one should chooseAand F such that |A| = K and ∀ i ∈ A, j ∈ F , channelW(i)

N has "better quality" than channelW(j)N .

*How to evaluate "quality" of the synthetic polarized channels {W(i)N }? Discussed later.

Setting Values of the Frozen BitsThe values of the frozen bits are known to both encoder and decoder – part of the codebook design.

49 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Encoding and Decoding Architectures

Encoding Architecture

x = uRNGN

where

u ≜[u1 u2 . . . uN

]denotes the uncoded

bits (union of information and frozen bits).

x ≜[x1 x2 . . . xN

]denotes the coded bits

(codeword).

GN ≜ G2⊗ . . .⊗︸ ︷︷ ︸ℓ times

G2 is the encoding matrix

(N = 2ℓ and G2 =

[1 01 1

]).

RN is the bit-reversal permutation matrix.

X1

X2

XNUN

U2

U1

Rate R = |A|N .

50 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Encoding and Decoding Architectures

Encoding Architecture

x = uRNGN

where

u ≜[u1 u2 . . . uN

]denotes the uncoded

bits (union of information and frozen bits).

x ≜[x1 x2 . . . xN

]denotes the coded bits

(codeword).

GN ≜ G2⊗ . . .⊗︸ ︷︷ ︸ℓ times

G2 is the encoding matrix

(N = 2ℓ and G2 =

[1 01 1

]).

RN is the bit-reversal permutation matrix.

X1

X2

XN

Info bits

Rate R = |A|N .

51 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Encoding and Decoding Architectures

Decoding

Successive Cancellation Decoding (SC Decoding)Upon receiving yN , the decoder start to decode ui from i = 1to i = N in a sequential manner, following the rule below:

ui = u∗i if i ∈ F .

ui = arg maxu∈{0,1}

W(i)N

(yN , ui−1

∣∣u) if i ∈ A.

In words, the decoder performs bit-wise sequential decoding.

Note: For i ∈ A, the decoding rule for that bit is not maximumlikelihood decoding because it does not make use of all frozenbits. In particular, {u∗j : j ∈ F , j > i} are not harnessed whendecoding Ui.

DEC

DEC

DEC

52 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

1 PolarizationBasic Channel TransformationChannel Polarization

2 Polar CodingEncoding and Decoding ArchitecturesPerformance Analysis

53 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

Probability of Error

Under SC decoding, probability of error of the proposed polar coding scheme depends on(1) channelW, (2) blocklength N , (3) code rate K

N , (4) frozen set F ⊂ [1 : N ], (5) frozen bits uF .Notation: here we use uF and uA to denote the frozen bits and information bits respectively.

First, define the average (over all codewords) probability of error with given frozen bits uF :

P(N)e

(KN ,F ,uF

)≜ P

{UN = UN

}=

∑uA∈{0,1}K

2−K · P{∃ i ∈ A s.t. Ui = Ui

∣∣∣UA = uA

}.

Next, we further average over uniformly randomly chosen frozen bits uF and define

P(N)e

(KN ,F

)≜

∑uF∈{0,1}N−K

2−(N−K) · P(N)e

(KN ,F ,uF

).

54 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

Upper Bounding the Probability of Error of Polar Coding (1)

Observe that P(N)e

(KN ,F

)= P

{∪i∈A{Ui = Ui, U

i−1 = U i−1}}

≤∑

i∈A P{Ui(Y

N , U i−1) = Ui, Ui−1 = U i−1

}, (2)

where Uii.i.d.∼ Ber

(12

), for all i = 1, 2, . . . , N . The inequality is due to union bound.

Recall that decoding function Ui(YN , U i−1) = arg max

u∈{0,1}W(i)

N

(Y N , U i−1

∣∣∣u). Hence,P{Ui(Y

N , U i−1) = Ui, Ui−1 = U i−1

}= P

{Ui(Y

N , U i−1) = Ui, Ui−1 = U i−1

}≤ P

{Ui(Y

N , U i−1) = Ui

}. (3)

55 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

Now things boil down to upper bounding P{Ui(Y

N , U i−1) = Ui

}, where

Ui(yN , ui−1) = arg max

u∈{0,1}W(i)

N

(yN , ui−1

∣∣u) . (4)

Key Observation: P{Ui(Y

N , U i−1) = Ui

}is the optimal error probability of error of a binary

detection problem, since the bitwise decoder (4) above is the corresponding MAP/ML detection rule!

Ui W(i)N

�Y N , U i�1

�MAP( ML)∼ Ber

!12

"

Next, we introduce Z (W) as an error probability upper bound for a binary detection problem withinput X ∼ Ber

(12

)and observation Y , following the probability transition lawW(y|x). Naturally it

measures the reliability of a channelW.

56 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

Bit-wise Decoding Error Probability

Lemma 1For a binary detection problem with input X ∼ Ber

(12

)and observation Y following the probability

transition lawW(y|x), the optimal probability of error (note: ML is optimal)

P{XML (Y ) = X

}≤ Z (W) , where Z (W) ≜

∑y∈Y

√W(y|0) ·W(y|1). (5)

pf: Recall that the ML detection rule: XML (y) = x ifW(y|x) ≥W(y|x⊕ 1). Hence,

P{XML (Y ) = X

}= EX,Y [1 {W(Y |X) <W(Y |X ⊕ 1)}] ≤ EX,Y

[√W(Y |X⊕1)W(Y |X)

].

It is not hard to verify that Z (W) = EX,Y

[√W(Y |X⊕1)W(Y |X)

](left as exercise).

57 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

Properties of the Reliability Function Z (·)

Proofs of the following properties are neglected here.1 Range of Z: 0 ≤ Z (W) ≤ 1. (By Cauchy-Schwarz)2 Polarization: under Arıkan's transformation,

Z (W+) = (Z (W))2, Z (W−) ≤ 2Z (W)− (Z (W))

2

Z (W+) + Z (W−) ≤ 2Z (W). (Reliability is improved after the polarization)Z (W+) ≤ Z (W) ≤ Z (W−).

3 Relation with I (W ): 1− Z(W) ≤ I (W ) ≤ 1− (Z(W))2.I (W ) ≈ 1 ⇐⇒ Z(W) ≈ 0

I (W ) ≈ 0 ⇐⇒ Z(W) ≈ 1

Hence, one can expect that channel polarization (Theorem 2) still holds if we change themeasure of "goodness" from capacity to reliability function.

58 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

Upper Bounding the Probability of Error of Polar Coding (2)

Combining (2), (3), and Lemma 1, we arrive at a nice upper bound on P(N)e

(KN ,F

):

P(N)e

(KN ,F

)≤

∑i∈A Z

(W(i)

N

). (6)

Implications of Upper Bound (6):

1 How to choose the frozen set F? If we would like to minimize (6), we should choose A and Fsuch that Z

(W(i)

N

)≤ Z

(W(j)

N

)for all i ∈ A and j ∈ F . In other words, we use Z (·) to

evaluate the quality of the synthetic polarized channels.2 Suppose we can compute the asymptotic limit of the proportion of synthetic polarized channels

whose Z (·) is smaller than some δN = o(N−1

). If R is less than this limit, for sufficiently large

N , we can further upper bound (6) by NR · δN which will vanish as N → ∞.

59 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

Speed of Channel Polarization

In other words, we would like to have some theorem which gives us the following result:

limN→∞

1N

∣∣∣{i : Z (W(i)

N

)< δN

}∣∣∣ = I (W ).

This is a stronger version of channel polarization than Theorem 2.

To see this, note that we can easily replace I(W(i)

N

)by 1− Z

(W(i)

N

)in Theorem 2 and the results

remain to hold for constants a and b, where a, b = Θ(1), invariant to N .

However, the desired theorem requires replacing a and b by δN and 1− δN respectively, whereδN = o

(N−1

). The proof of Theorem 2 presented before cannot be extended to this case.

60 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

Nevertheless, Arıkan and Telatar proved an even stronger result, where δN = 2−Nβ , β ∈(0, 12

).

Below we present this result without proving it.

Theorem 3 (Rate of Channel Polarization [Arıkan-Telatar ISIT09])

Direct Part: For β ∈(0, 12

),

limN→∞

1

N

∣∣∣{i : Z (W(i)

N

)< 2−Nβ

}∣∣∣ = I (W ) (7)

limN→∞

1

N

∣∣∣{i : Z (W(i)

N

)> 1− 2−Nβ

}∣∣∣ = 1− I (W ) (8)

Converse Part: For β > 12 , if I (W ) < 1,

limN→∞

1

N

∣∣∣{i : Z (W(i)

N

)< 2−Nβ

}∣∣∣ = 0.

61 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

Coding Theorem for Polar Coding

Theorem 4 (Polar Coding Achieves Capacity of BMSC)

Suppose the frozen set F is chosen such that Z(W(i)

N

)≤ Z

(W(j)

N

)for all i ∈ A and j ∈ F . Then,

limN→∞

P(N)e (R,F) · 2Nβ

= 0 (9)

for any rate R < I (W ) and β ∈(0, 12

). In other words, P(N)

e (R,F) = o(2−Nβ

).

Note: (9) guarantees that the probability of error vanishes as N → ∞ for some choice of frozen bitsuF , as long as R < I (W ), the channel capacity. Hence, it shows that polar code can achieve thecapacity of the channelW.

Remark: In fact, for symmetric channels, it can be shown that (9) remains true even if we replaceP(N)e (R,F) by P(N)

e (R,F ,uF ) for any uF ∈ {0, 1}N(1−R). This will be explored in HW4.

62 / 63 I-Hsiang Wang IT Lecture 6

Polar Coding Performance Analysis

pf: Fix some β′ ∈(β, 12

). Since R < I (W ), by Lemma 3, for N sufficiently large,∣∣∣{i : Z (

W(i)N

)< 2−Nβ′

}∣∣∣ > NR.

Since we pick A and F such that |A| = NR and all synthetic polarized channels with indices in Ahave smaller Z (·) than those in F , we have

Z(W(i)

N

)< 2−Nβ′

, ∀ i ∈ A.

Hence, by the upper bound (6), we conclude that

P(N)e (R,F) < NR · 2−Nβ′ =⇒ P(N)

e (R,F) · 2Nβ< NR · 2−(Nβ′−Nβ).

Proof is complete by observing limN→∞

NR · 2−(Nβ′−Nβ) = 0.

63 / 63 I-Hsiang Wang IT Lecture 6