A Uniﬁed Framework for Searching Tree: A Classiﬁcation of ... · Mohamed Oussama Damen ECE Department, University of Waterloo, ON, CANADA E-mail: [email protected] Joint

Eurecom Institute

A Unified Framework for Searching Tree: AClassification of Sequential Decoding Algorithms

Mohamed Oussama DamenECE Department, University of Waterloo, ON, CANADA

E-mail: [email protected]

Joint work with A. D. Murugan, H. El Gamal (OSU) and G. Caire (Eurecom)

Sophia Antipolis, June 28, 2005

Outline of the Talk

• The closest lattice point search (CLPS) and joint maximum likelihood (ML)detection and decoding.

• Searching the tree: Preprocessing stage.

Taming the channel: Left preprocessing.Inducing sparsity: Right preprocessing.Forming the tree.

• Restricting the paths explored in the tree: Branch and Bound (BB) or testing thecost functions of child nodes against a bounding function.

1

Outline of the Talk (Cont’d)

• Generic Branch and Bound (GBB) Algorithm.

Breadth First Search.Depth First Search.Best First Search.Iterative Best First Search.

• The Chosen Algorithm.

• Analytical and Numerical Results.

• Conclusions.

2

Lattice Codes over Linear Gaussian Channels

• Let Λ = {λ = Gx : x ∈ Zm} ⊂ Rm be a lattice where G ∈ Rm×m is the latticegenerator matrix. Let v ∈ Rm be a vector and R ⊂ Rm be a measurable region.

A lattice code C(Λ,v,R) is the set of points of Λ+v inside the shaping region R,i.e., C(Λ,v,R) = {Λ + v} ∩ R.

Equivalently, C(Λ,v,R) is the set of points c + v, c = Gx with x ∈ U whereU ⊂ Zm is the code information set.

• We consider the following (baseband) communication model

r = H(c + v) + z (1)

where r ∈ Rn is the received signal, z ∼ N (0, I) is the AWGN vector, andH ∈ Rn×m is the linear channel matrix.

3

Maximum Likelihood (ML) Decoding and CLPS

• Problem: First, information vector x is uniformly generated over U . Then, thesignal c + v, with c = Gx, is transmitted over the channel (1) with matrix H.Assuming H,G and v known to the receiver, the ML decoding rule is given by

x̂ = arg minx∈U⊂Zm

|r−Hv −HGx|2 (2)

• Solution: The optimization problem (2) can be viewed as a constrained versionof the CLPS with lattice generator matrix HG and constraint set U .

4

ML Decoding and CLPS: Examples (I)

• Lattice codes over MIMO flat fading channels:The complex baseband received signal is given by

rct =

√ρ

MHccc

t + zct, t = 1, . . . , T

which can be put under the form (1) by collecting the signals over T symbolperiods ⇒ cc = [c

T

1 , . . . , cT

T ]T

and then separating the real and imaginary parts

c ← [Re{cc}T, Im{cc}T

]T, H ← IT ⊗

[Re{Hc} −Im{Hc}Im{Hc} Re{Hc}

]

with n = 2NT and m = 2MT .

5

ML Decoding and CLPS: Examples (II)

LAST codes over MIMO channels:

1. Typically, the shaping region R of C(Λ,v,R) can be m-dimensional sphere, theVoronoi region of a sublattice Λ′ ⊂ Λ, or U = Zm

Q (e.g., linear dispersion codes,or V-BLAST when T = 1 and G = κI with κ a power normalizing factor).

2. Lattice codes from Construction A: Include all algebraic STCs (stackingconstruction, Hammons-ElGamal, Lu-Kumar, ...). Λ = C + QZm, where C ⊆ Zm

Q

is a linear code over ZQ with generator matrix in systematic form [I,PT]T. A

generator matrix of Λ is given by

G =[

I 0P QI

].

6

ML Decoding and CLPS: Examples (III)

• Coset Codes over ISI Channels: For simplicity, we consider a baseband realSISO-ISI channel with the input/output related by ri =

∑L`=0 h`ci−` + zi, with

(h0, . . . , hL) as the channel. When L-zero padded, ISI received signals ≡ (1)with the channel matrix

H =

h0

h1 h0... h1

. . .hL

. . . . . . h0

hL. . . h1. . . ...

hL

.

• The lattice generator matrix of the coset code is obtained via Construction A.

7

ML Decoding and CLPS: Examples (IV)

• Rotated QAM constellations over SISO fast fading channel:

r = diag(h1, . . . , hN) Gx + z

with GGT

= I and U = ZmQ .

• Linearly precoded OFDM:

rc = diag(H1, . . . , HN)Gcxc + zc.

• CDMA:r = S diag(w1, . . . , wK)x + z, U = Zm

Q .8

Sphere/Sequential Decoding Basic Ideas

9

CLPS: Preprocessing and Search Stages (I)

• The system we have at hand

r ← r−Hv = HGx + z.

• Joint detection (correlation induced by H) and decoding (correlation inducedby G) ⇒ Form a combined search tree HG = QR, Q unitary and R uppertriangular.

• Conventional sphere/sequential decoders can be seen as ZF-DFEs with somereprocessing capability of their tentative decisions to obtain

x̂ = argminx∈U

∣∣∣QTr−Rx

∣∣∣2

10

Current variable

≈

QTrxR

Decision feedback

Current interval determination

11

CLPS: Preprocessing and Search Stages (II)

• The above conventional application of sphere/sequential decoding suffers fromtwo inconveniences

1. When rank(HG) < m or HG is ill-conditioned ⇒ the spread of the diagonalelements of R is large and the search can be very complex.

2. Enforcing x ∈ U is very difficult when U has a complicated shape ⇒(Naive) lattice decoding can solve this problem by searching over Zm (insteadof U) but it is far from ML in general.

• Solution: Preprocessing!

• In addition, preprocessing H and G can have a great effect on the complexityof the search stage to make the tree more “friendly” (improving the quality of theZF-DFE).

12

The Preprocessing Stage

• Left preprocessing (→ × H): Modifies H and z such that the resulting CLPSis not equivalent to ML but has a much better conditioned “channel” matrix andmakes lattice decoding near-optimal.

• Right preprocessing (G × ←): When boundary region is removed, we have thefreedom of choosing the lattice basis which is more convenient for the searchalgorithm.

Left Preprocessing applied only on the channel matrix; right preprocessing

13

applied on the whole. Important: any preprocessing should not destruct the codestructure

14

Taming the Channel: Left Preprocessing

• Uncoded system (G = I): Forming the tree H = QR can be seen as multiplyingby the ZF-DFE feedforward matrix, Q and then searching the tree obtain via theZF-DFE backward matrix R.

• MMSE-DFE outperforms ZF-DFE in terms of SINR: H̃ 4=[

HI

]= Q̃R1.

• Let Q1 be the upper n×m part of Q̃ ⇒ The transformed CLPS

minx∈U

∣∣∣QT

1r−R1Gx∣∣∣2

(3)

is not equivalent to (2) (because Q1 is not unitary).

• The additive noise w = y′ −R1Gx has a Gaussian component QT

1z and a non-Gaussian (signal-dependent) component (Q

T

1H−R1)(c + v). But it is white!15

Inducing Sparsity: Right Preprocessing (I)

• Motivation: To obtain the tree ⇒ QR-decompose R1G = QR.The sparser R the smaller the search complexity (e.g., R = I).

• Problem: Find a unimodular matrix T such that the QR decomposition ofR1GT−1 minimizes the sparsity index of R

S(R) 4= maxi∈{1,...,m}

∑mj=i+1 r2

i,j

r2i,i

.

• Exact solution is very difficult to obtain, but there are very good approximations.

16

Inducing Sparsity: Right Preprocessing (II)

• Lattice reduction: LLL algorithm (with deep insertion). Find a new lattice basiswith reduced vectors HGT−1

1 .

• Column permutation of GT−11 : V-BLAST greedy ordering finds a permutation

matrix Σ that maximizes the minimum diagonal element of R, min ri,i (i.e.,minimizes S(R)).

• Right-multiply by T−1 = T−11 Σ−1.

• Right multiplication by unimodular matrices does not alter lattice decoding sinceTZm = Zm.

17

−6 −4 −2 0 2 4 6

−6

−4

−2

0

2

4

6

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

(a) The translated Z2 lattice and QAM constellation (b) The received lattice after channel distortion (constellation)

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

−6 −4 −2 0 2 4 6

−6

−4

−2

0

2

4

6

(c) After MMSE-DFE left preprocessing (d) Boundary control after right preprocessing18

Preprocessing: Form the Tree (Finally!)

• Given H,G: QR-decompose QT

1HGT−1 after left and right preprocessing.

• The preprocessed system in (1) becomes (with convenient notations):

ym...y1

=

rm,m . . . . . . rm,1

0 rm−1,m−1 . . . rm−1,1... . . . . . . ...0 . . . 0 r1,1

xm...

x1

+

wm...

w1

. (4)

19

Points with cost function within bounds

Level 3

Level 2

Level 1

Level 4

Root

20

The Tree Search Stage: Branch and Bound (BB)

• Exhaustive tree search can be prohibitive if not impossible (e.g., lattice decoding).

• BB reduces the complexity of tree search by determining if an intermediate nodexk

1, on extending, has any chance of yielding the desired leaf node.

• This decision is taken by comparing the cost function, assigned to the node bythe search algorithm, f(xk

1), against a bounding function, tk.

• BB maintains a list of ACTIVE nodes to be extended, L, and ends when L isempty.

• Different BB algorithms differ in their cost function, bounding function and therules to generate and sort nodes.

21

The Search Stage: Generic Branch and Bound (GBB)

• A unified framework for sphere/sequential decoding that encompasses many ofthe existing ones as special cases.

• A classification of various algorithms depending on the different functions togenerate, sort and compare the nodes.

• Illuminate the structure of sequential decoding allowing the system designer tochoose the algorithm most adapted to the problem.

22

yes no

yes

no

no

yes

yes

no

Exit

x̂=

argm

in(d

(y,x),

d(y

,x̂))

t←

g1(

t,f

(xm 1

))

sort(L)

k = m? xk1 valid?

xk1 ← top{L}L is empty ?

nc ← 1L ← {root}Create ACTIVE ListL

Rem

ovex

k 1fr

omL

All children of xk1

generated ?Updatef(xk

1 ), f(xk+11 )

nc ← nc + 1

L ← L ∪ gen(xk1 )

t ← g2(t, nc,L)

GBB (f , t, sort, gen, g1, g2)

23

GBB Classification: Breadth First Search (BrFS)

• Bounding function is fixed g1(t, f(xm1 )) = t and cost function f(xk

1) is neverupdated.

• Optimal Pohst enumeration: tk = C0, f(xk1) =

∑ki=1 wi(xi

1) ≤ C0.

• Heuristic statistical pruning (increased radii tk < tk+1) or elliptical pruning

(f(xk1) =

∑ki=1

wi(xi1)

ek). In gen., when g2 isn’t used, BrFS ≡ Wozencraft decoder.

• M and T -algorithms where g2(·, ·, ·) restricts the search space. Level k is abovelevel k + 1 and in same level, nodes ↗ sorted w.r.t. f().

• Advantages: Suitable for soft-outputs. Robust against variations in SNR andchannel conditions.

• Disadvantages: Large average complexity. Doesn’t reduce complexity with highSNR and/or better channels.

24

GBB Classification: Depth First Search (DFS)

• Order nodes in L in reverse order of generation and g1(t, f(xm1 )) =

[min(t1, f(xm1 )), ..., min(tm, f(xm

1 ))]T

. Different DFS depending on gen, g2.

• Modified Viterbo-Boutros algorithm: g2(t, nc) = t, and f(xk1) =

∑ki=1 wi(xi

1).For any node xk

1 and its interval [a0, a1] of children (i.e., child nodes satisfying∑k+1i=1 wi(xi

1) ≤ C0), gen generates the child nodes lexicographically.

• Schnorr-Euchner: gen child nodes w.r.t. accumulated squared-distance.

8

a0 a1

Unquantized decision feedback point

1 23 45 67

25

GBB Classification: Best First Search (BeFS)

• Nodes in L sorted ↗ order of their cost functions, and g1(t, f(xm1 )) =

[min(t1, f(xm1 )), . . . , min(tm, f(xm

1 ))]T

.

• In BeFS, the search can be terminated once a leaf node reaches the top of L.

• The stack algorithm is BeFS: First, let g2(t, nc) = t. Second, if xk1 is a leaf node,

then f(xk1) = −∞; else let xk+1

1,g be the best child of xk1 not generated yet, and set

f(xk1) =

∑k+1i=1 wi(xk+1

1,g )− b(k + 1) with b ≥ 0 the bias.

• Theorem 1: The stack algorithm with b = 0 generates the least number of nodesamong all optimal tree search algorithms (Xu et al. Globecom’2004).

• Theorem 2: The increased radii BrFS algorithm with {t : tk = bk + δ}, generatesat least as many nodes as the stack algorithm with the same bias b.

26

GBB Classification: BeFS (Cont’d)

• In the stack algorithm: For each node generated two cost functions are updated.

• The stack algorithm offers a natural solution for the problem of choosing the initialradius (or radii): tk = ∞.

• The stack allows for a systematic approach for trading-off performance forcomplexity: b = 0 ⇒ Optimal CLPS. b = ∞⇒ MMSE-Babai point decoder.

• In general, for systems with small dimension m, and/or high SNRs/”friendly”channels, one can obtain near-optimal performance with a relatively large valuesof b (i.e., reduced complexity).

• Disadvantage: The required memory to maintain the active list L can beprohibitive.

27

GBB Classification: Iterative Best First Search

• Modified memory efficient BeFS algorithms.

• Store only one node at a time, and allow nodes to be visited more than once.

• The search in this case progresses in contours of increasing bounding functions,thus allowing more and more nodes to be generated at each step, finallyterminating once a leaf node is obtained.

• The Fano decoder is the iterative BeFS variation of the stack algorithm. Sameset of nodes visited, but the Fano requires no memory and can visit nodes morethan once.

28

The Chosen GBB Algorithm

Left preprocessing (MMSE-DFE) and right preprocessing(combined lattice reduction and greedy ordering),

followed by the Fano (or stack) search stage for lattice,not ML, decoding.

29

Some Analytical and Numerical Results

• Analytical characterization of the performance-complexity tradeoff for sequentialand sphere decoders with arbitrary HG and U still appears intractable.

• Consider V-BLAST configuration over flat Rayleigh fading with ZF-DFEpreprocessing (only for analytical results).

• Theorem 3: The Stack algorithm and the Fano decoder with any finite biasb, achieve the same diversity as the ML decoder when applied to a V-BLASTconfiguration.

• Theorem 4: In a V-BLAST system with Q2-QAM, the average complexity perdimension of the stack algorithm for a sufficiently large bias b is linear in m whenthe SNR ρ grows linearly with m and r = n−m ≥ 0.

30

15 16 17 18 19 20 21 22 2310

0

101

102

103

104

Ave

rage

com

plex

ity p

er d

imen

sion

SNR

SEZ

SEM

FanoZ (b=1)

FanoM

(b=1)

15 16 17 18 19 20 21 22 2310

−5

10−4

10−3

10−2

10−1

100

SNR

Fra

me

Err

or R

ate

SEZ

SEM

FanoZ (b=1)

FanoM

(b=1)

Figure 1: Complexity and Performance of SE enumeration and Fano decoder for a20× 20 16−QAM V-BLAST system 31

5 6 7 8 9 10 11 12 13 14 1510

0

101

102

103

SNR

Ave

rage

com

plex

ity p

er d

imen

sion

FanoZ (b=1)

FanoZ (b=1.5)

FanoM

(b=1.5)

5 6 7 8 9 10 11 12 13 14 15 16

10−4

10−3

10−2

10−1

100

SNR

Fra

me

Err

or R

ate

FanoZ (b=1)

FanoZ (b=1.5)

FanoM

(b=1.5)

Figure 2: Complexity and Performance of Fano decoder with ZF-DFE andMMSE-DFE based preprocessing for a 30× 30 4−QAM V-BLAST system 32

0 2 4 6 8 10 12 14 16 180

2

4

6

8

10

12

14

bias, b

Exp

ecte

d co

mpl

exity

per

dim

ensi

on

0 2 4 6 8 10 12 14 16 1810

−3

10−2

10−1

100

bias, b

Ave

rage

Err

or R

ates

Frame Error RateBit Error Rate

Figure 3: Complexity and Performance of Fano decoder with different bias, for a20× 20 4−QAM V-BLAST system with ZF-DFE preprocessing

33

0 2 4 6 8 10 1210

−5

10−4

10−3

10−2

10−1

100

Eb/N

0 (dB)

Ave

rage

bit

erro

r ra

te

M=N=4, R=8 bits/channel use (4−QAM)

ML decodingMMSE−DFE Babai decoderYWWF (or Babai) decoder

Figure 4: MMSE-Babai point decoder: Performance of MMSE-DFE preprocessingwith DFE for a 4× 4, 4−QAM V-BLAST system

34

22 23 24 25 26 27 28 29 3010

−3

10−2

10−1

100

SNR (dB)

Blo

ck e

rro

r ra

te

3 Tx, 1 Rx, 6 bits/channel use, TAST codes

MMSE−DFE lattice decoding of rate−1 TAST codeMMSE−DFE lattice decoding of rate−3 TAST codeML decoding of rate−1 TAST codeML decoding of rate−3 TAST code

Figure 5: Under-determined systems: Performance of TAST codes underMMSE-DFE lattice decoding and ML detection with M = 3 and N = 1.

35

0 2 4 6 8 10 12 14 16 18 2010

−3

10−2

10−1

100

SNR (dB)

Fra

me

erro

r ra

te

MMSE−lattice decoding of algebraic STC with Golay(12,24) code, M=2, T=12, N=1, R=1 bpcu

ML decodingMMSE−lattice decoding

0 2 4 6 8 10 12 14 16 1810

−5

10−4

10−3

10−2

10−1

100

SNR (dB)

Fra

me

erro

r ra

te

MMSE−lattice decoding of Hammons−ElGamal stacking construction STC, M=T=3, N=1, R=1 bpcu

ML decodingMMSE−lattice decoding

Figure 6: Construction A, algebraic space-time codes: Performance of MMSE-DFElattice decoding and ML decoding for algebraic space-time codes

36

0 1 2 3 4 5 6 7 8 9 1010

−5

10−4

10−3

10−2

10−1

100

Eb / N

0

Fra

me

Err

or R

ate

FER of PSP 4−state,Fano 4−state and Fano 1024−state

PSP, 4−stateFano, 4−stateFano, 1024−state

0 1 2 3 4 5 6 7 8 9 1010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Eb / N

0

Bit

Err

or R

ate

BER of PSP 4−state, Fano 4−state and Fano 1024−state

PSP 4−stateFano 4−stateFano 1024−state

Figure 7: Construction A, ISI-channels: Frame and Bit Error Rate curves for theFano and PSP algorithms for convolutional codes over the ISI channel with responsechosen as (0.848,−0.424, 0.2545,−0.1696, 0.0848).

37

Conclusions

• A unified framework for tree search decoding in wireless communicationapplications.

• Identified the roles of two different, but inter-related, components of the decoder,namely; 1) Preprocessing and 2) Tree Search.

• MMSE-DFE filtering (left preprocessing) and lattice reduction with column greedyordering (right preprocessing) allows for near-optimal lattice decoding.

• By relaxing the boundary control, build a generic framework for designing treesearch strategies for joint detection and decoding.

• BeFS are the most efficient tree search algorithms and Iterative BeFS (Fano)tradeoff complexity for memory. Through analytical and numerical results, wehave shown that the proposed framework solve many communication problemswith reduced complexity and near-optimal performance.

38

Thank You

39

Documents

A Uniﬁed Framework for Searching Tree: A Classiﬁcation of ... · Mohamed Oussama Damen ECE Department, University of Waterloo, ON, CANADA E-mail: [email protected] Joint