Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Eurecom Institute
A Unified Framework for Searching Tree: AClassification of Sequential Decoding Algorithms
Mohamed Oussama DamenECE Department, University of Waterloo, ON, CANADA
E-mail: [email protected]
Joint work with A. D. Murugan, H. El Gamal (OSU) and G. Caire (Eurecom)
Sophia Antipolis, June 28, 2005
Outline of the Talk
• The closest lattice point search (CLPS) and joint maximum likelihood (ML)detection and decoding.
• Searching the tree: Preprocessing stage.
Taming the channel: Left preprocessing.Inducing sparsity: Right preprocessing.Forming the tree.
• Restricting the paths explored in the tree: Branch and Bound (BB) or testing thecost functions of child nodes against a bounding function.
1
Outline of the Talk (Cont’d)
• Generic Branch and Bound (GBB) Algorithm.
Breadth First Search.Depth First Search.Best First Search.Iterative Best First Search.
• The Chosen Algorithm.
• Analytical and Numerical Results.
• Conclusions.
2
Lattice Codes over Linear Gaussian Channels
• Let Λ = {λ = Gx : x ∈ Zm} ⊂ Rm be a lattice where G ∈ Rm×m is the latticegenerator matrix. Let v ∈ Rm be a vector and R ⊂ Rm be a measurable region.
A lattice code C(Λ,v,R) is the set of points of Λ+v inside the shaping region R,i.e., C(Λ,v,R) = {Λ + v} ∩ R.
Equivalently, C(Λ,v,R) is the set of points c + v, c = Gx with x ∈ U whereU ⊂ Zm is the code information set.
• We consider the following (baseband) communication model
r = H(c + v) + z (1)
where r ∈ Rn is the received signal, z ∼ N (0, I) is the AWGN vector, andH ∈ Rn×m is the linear channel matrix.
3
Maximum Likelihood (ML) Decoding and CLPS
• Problem: First, information vector x is uniformly generated over U . Then, thesignal c + v, with c = Gx, is transmitted over the channel (1) with matrix H.Assuming H,G and v known to the receiver, the ML decoding rule is given by
x̂ = arg minx∈U⊂Zm
|r−Hv −HGx|2 (2)
• Solution: The optimization problem (2) can be viewed as a constrained versionof the CLPS with lattice generator matrix HG and constraint set U .
4
ML Decoding and CLPS: Examples (I)
• Lattice codes over MIMO flat fading channels:The complex baseband received signal is given by
rct =
√ρ
MHccc
t + zct, t = 1, . . . , T
which can be put under the form (1) by collecting the signals over T symbolperiods ⇒ cc = [c
T
1 , . . . , cT
T ]T
and then separating the real and imaginary parts
c ← [Re{cc}T, Im{cc}T
]T, H ← IT ⊗
[Re{Hc} −Im{Hc}Im{Hc} Re{Hc}
]
with n = 2NT and m = 2MT .
5
ML Decoding and CLPS: Examples (II)
LAST codes over MIMO channels:
1. Typically, the shaping region R of C(Λ,v,R) can be m-dimensional sphere, theVoronoi region of a sublattice Λ′ ⊂ Λ, or U = Zm
Q (e.g., linear dispersion codes,or V-BLAST when T = 1 and G = κI with κ a power normalizing factor).
2. Lattice codes from Construction A: Include all algebraic STCs (stackingconstruction, Hammons-ElGamal, Lu-Kumar, ...). Λ = C + QZm, where C ⊆ Zm
Q
is a linear code over ZQ with generator matrix in systematic form [I,PT]T. A
generator matrix of Λ is given by
G =[
I 0P QI
].
6
ML Decoding and CLPS: Examples (III)
• Coset Codes over ISI Channels: For simplicity, we consider a baseband realSISO-ISI channel with the input/output related by ri =
∑L`=0 h`ci−` + zi, with
(h0, . . . , hL) as the channel. When L-zero padded, ISI received signals ≡ (1)with the channel matrix
H =
h0
h1 h0... h1
. . .hL
. . . . . . h0
hL. . . h1. . . ...
hL
.
• The lattice generator matrix of the coset code is obtained via Construction A.
7
ML Decoding and CLPS: Examples (IV)
• Rotated QAM constellations over SISO fast fading channel:
r = diag(h1, . . . , hN) Gx + z
with GGT
= I and U = ZmQ .
• Linearly precoded OFDM:
rc = diag(H1, . . . , HN)Gcxc + zc.
• CDMA:r = S diag(w1, . . . , wK)x + z, U = Zm
Q .8
Sphere/Sequential Decoding Basic Ideas
9
CLPS: Preprocessing and Search Stages (I)
• The system we have at hand
r ← r−Hv = HGx + z.
• Joint detection (correlation induced by H) and decoding (correlation inducedby G) ⇒ Form a combined search tree HG = QR, Q unitary and R uppertriangular.
• Conventional sphere/sequential decoders can be seen as ZF-DFEs with somereprocessing capability of their tentative decisions to obtain
x̂ = argminx∈U
∣∣∣QTr−Rx
∣∣∣2
10
Current variable
≈
QTrxR
Decision feedback
Current interval determination
11
CLPS: Preprocessing and Search Stages (II)
• The above conventional application of sphere/sequential decoding suffers fromtwo inconveniences
1. When rank(HG) < m or HG is ill-conditioned ⇒ the spread of the diagonalelements of R is large and the search can be very complex.
2. Enforcing x ∈ U is very difficult when U has a complicated shape ⇒(Naive) lattice decoding can solve this problem by searching over Zm (insteadof U) but it is far from ML in general.
• Solution: Preprocessing!
• In addition, preprocessing H and G can have a great effect on the complexityof the search stage to make the tree more “friendly” (improving the quality of theZF-DFE).
12
The Preprocessing Stage
• Left preprocessing (→ × H): Modifies H and z such that the resulting CLPSis not equivalent to ML but has a much better conditioned “channel” matrix andmakes lattice decoding near-optimal.
• Right preprocessing (G × ←): When boundary region is removed, we have thefreedom of choosing the lattice basis which is more convenient for the searchalgorithm.
Left Preprocessing applied only on the channel matrix; right preprocessing
13
applied on the whole. Important: any preprocessing should not destruct the codestructure
14
Taming the Channel: Left Preprocessing
• Uncoded system (G = I): Forming the tree H = QR can be seen as multiplyingby the ZF-DFE feedforward matrix, Q and then searching the tree obtain via theZF-DFE backward matrix R.
• MMSE-DFE outperforms ZF-DFE in terms of SINR: H̃ 4=[
HI
]= Q̃R1.
• Let Q1 be the upper n×m part of Q̃ ⇒ The transformed CLPS
minx∈U
∣∣∣QT
1r−R1Gx∣∣∣2
(3)
is not equivalent to (2) (because Q1 is not unitary).
• The additive noise w = y′ −R1Gx has a Gaussian component QT
1z and a non-Gaussian (signal-dependent) component (Q
T
1H−R1)(c + v). But it is white!15
Inducing Sparsity: Right Preprocessing (I)
• Motivation: To obtain the tree ⇒ QR-decompose R1G = QR.The sparser R the smaller the search complexity (e.g., R = I).
• Problem: Find a unimodular matrix T such that the QR decomposition ofR1GT−1 minimizes the sparsity index of R
S(R) 4= maxi∈{1,...,m}
∑mj=i+1 r2
i,j
r2i,i
.
• Exact solution is very difficult to obtain, but there are very good approximations.
16
Inducing Sparsity: Right Preprocessing (II)
• Lattice reduction: LLL algorithm (with deep insertion). Find a new lattice basiswith reduced vectors HGT−1
1 .
• Column permutation of GT−11 : V-BLAST greedy ordering finds a permutation
matrix Σ that maximizes the minimum diagonal element of R, min ri,i (i.e.,minimizes S(R)).
• Right-multiply by T−1 = T−11 Σ−1.
• Right multiplication by unimodular matrices does not alter lattice decoding sinceTZm = Zm.
17
−6 −4 −2 0 2 4 6
−6
−4
−2
0
2
4
6
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
(a) The translated Z2 lattice and QAM constellation (b) The received lattice after channel distortion (constellation)
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−6 −4 −2 0 2 4 6
−6
−4
−2
0
2
4
6
(c) After MMSE-DFE left preprocessing (d) Boundary control after right preprocessing18
Preprocessing: Form the Tree (Finally!)
• Given H,G: QR-decompose QT
1HGT−1 after left and right preprocessing.
• The preprocessed system in (1) becomes (with convenient notations):
ym...y1
=
rm,m . . . . . . rm,1
0 rm−1,m−1 . . . rm−1,1... . . . . . . ...0 . . . 0 r1,1
xm...
x1
+
wm...
w1
. (4)
19
Points with cost function within bounds
Level 3
Level 2
Level 1
Level 4
Root
20
The Tree Search Stage: Branch and Bound (BB)
• Exhaustive tree search can be prohibitive if not impossible (e.g., lattice decoding).
• BB reduces the complexity of tree search by determining if an intermediate nodexk
1, on extending, has any chance of yielding the desired leaf node.
• This decision is taken by comparing the cost function, assigned to the node bythe search algorithm, f(xk
1), against a bounding function, tk.
• BB maintains a list of ACTIVE nodes to be extended, L, and ends when L isempty.
• Different BB algorithms differ in their cost function, bounding function and therules to generate and sort nodes.
21
The Search Stage: Generic Branch and Bound (GBB)
• A unified framework for sphere/sequential decoding that encompasses many ofthe existing ones as special cases.
• A classification of various algorithms depending on the different functions togenerate, sort and compare the nodes.
• Illuminate the structure of sequential decoding allowing the system designer tochoose the algorithm most adapted to the problem.
22
yes no
yes
no
no
yes
yes
no
Exit
x̂=
argm
in(d
(y,x),
d(y
,x̂))
t←
g1(
t,f
(xm 1
))
sort(L)
k = m? xk1 valid?
xk1 ← top{L}L is empty ?
nc ← 1L ← {root}Create ACTIVE ListL
Rem
ovex
k 1fr
omL
All children of xk1
generated ?Updatef(xk
1 ), f(xk+11 )
nc ← nc + 1
L ← L ∪ gen(xk1 )
t ← g2(t, nc,L)
GBB (f , t, sort, gen, g1, g2)
23
GBB Classification: Breadth First Search (BrFS)
• Bounding function is fixed g1(t, f(xm1 )) = t and cost function f(xk
1) is neverupdated.
• Optimal Pohst enumeration: tk = C0, f(xk1) =
∑ki=1 wi(xi
1) ≤ C0.
• Heuristic statistical pruning (increased radii tk < tk+1) or elliptical pruning
(f(xk1) =
∑ki=1
wi(xi1)
ek). In gen., when g2 isn’t used, BrFS ≡ Wozencraft decoder.
• M and T -algorithms where g2(·, ·, ·) restricts the search space. Level k is abovelevel k + 1 and in same level, nodes ↗ sorted w.r.t. f().
• Advantages: Suitable for soft-outputs. Robust against variations in SNR andchannel conditions.
• Disadvantages: Large average complexity. Doesn’t reduce complexity with highSNR and/or better channels.
24
GBB Classification: Depth First Search (DFS)
• Order nodes in L in reverse order of generation and g1(t, f(xm1 )) =
[min(t1, f(xm1 )), ..., min(tm, f(xm
1 ))]T
. Different DFS depending on gen, g2.
• Modified Viterbo-Boutros algorithm: g2(t, nc) = t, and f(xk1) =
∑ki=1 wi(xi
1).For any node xk
1 and its interval [a0, a1] of children (i.e., child nodes satisfying∑k+1i=1 wi(xi
1) ≤ C0), gen generates the child nodes lexicographically.
• Schnorr-Euchner: gen child nodes w.r.t. accumulated squared-distance.
8
a0 a1
Unquantized decision feedback point
1 23 45 67
25
GBB Classification: Best First Search (BeFS)
• Nodes in L sorted ↗ order of their cost functions, and g1(t, f(xm1 )) =
[min(t1, f(xm1 )), . . . , min(tm, f(xm
1 ))]T
.
• In BeFS, the search can be terminated once a leaf node reaches the top of L.
• The stack algorithm is BeFS: First, let g2(t, nc) = t. Second, if xk1 is a leaf node,
then f(xk1) = −∞; else let xk+1
1,g be the best child of xk1 not generated yet, and set
f(xk1) =
∑k+1i=1 wi(xk+1
1,g )− b(k + 1) with b ≥ 0 the bias.
• Theorem 1: The stack algorithm with b = 0 generates the least number of nodesamong all optimal tree search algorithms (Xu et al. Globecom’2004).
• Theorem 2: The increased radii BrFS algorithm with {t : tk = bk + δ}, generatesat least as many nodes as the stack algorithm with the same bias b.
26
GBB Classification: BeFS (Cont’d)
• In the stack algorithm: For each node generated two cost functions are updated.
• The stack algorithm offers a natural solution for the problem of choosing the initialradius (or radii): tk = ∞.
• The stack allows for a systematic approach for trading-off performance forcomplexity: b = 0 ⇒ Optimal CLPS. b = ∞⇒ MMSE-Babai point decoder.
• In general, for systems with small dimension m, and/or high SNRs/”friendly”channels, one can obtain near-optimal performance with a relatively large valuesof b (i.e., reduced complexity).
• Disadvantage: The required memory to maintain the active list L can beprohibitive.
27
GBB Classification: Iterative Best First Search
• Modified memory efficient BeFS algorithms.
• Store only one node at a time, and allow nodes to be visited more than once.
• The search in this case progresses in contours of increasing bounding functions,thus allowing more and more nodes to be generated at each step, finallyterminating once a leaf node is obtained.
• The Fano decoder is the iterative BeFS variation of the stack algorithm. Sameset of nodes visited, but the Fano requires no memory and can visit nodes morethan once.
28
The Chosen GBB Algorithm
Left preprocessing (MMSE-DFE) and right preprocessing(combined lattice reduction and greedy ordering),
followed by the Fano (or stack) search stage for lattice,not ML, decoding.
29
Some Analytical and Numerical Results
• Analytical characterization of the performance-complexity tradeoff for sequentialand sphere decoders with arbitrary HG and U still appears intractable.
• Consider V-BLAST configuration over flat Rayleigh fading with ZF-DFEpreprocessing (only for analytical results).
• Theorem 3: The Stack algorithm and the Fano decoder with any finite biasb, achieve the same diversity as the ML decoder when applied to a V-BLASTconfiguration.
• Theorem 4: In a V-BLAST system with Q2-QAM, the average complexity perdimension of the stack algorithm for a sufficiently large bias b is linear in m whenthe SNR ρ grows linearly with m and r = n−m ≥ 0.
30
15 16 17 18 19 20 21 22 2310
0
101
102
103
104
Ave
rage
com
plex
ity p
er d
imen
sion
SNR
SEZ
SEM
FanoZ (b=1)
FanoM
(b=1)
15 16 17 18 19 20 21 22 2310
−5
10−4
10−3
10−2
10−1
100
SNR
Fra
me
Err
or R
ate
SEZ
SEM
FanoZ (b=1)
FanoM
(b=1)
Figure 1: Complexity and Performance of SE enumeration and Fano decoder for a20× 20 16−QAM V-BLAST system 31
5 6 7 8 9 10 11 12 13 14 1510
0
101
102
103
SNR
Ave
rage
com
plex
ity p
er d
imen
sion
FanoZ (b=1)
FanoZ (b=1.5)
FanoM
(b=1.5)
5 6 7 8 9 10 11 12 13 14 15 16
10−4
10−3
10−2
10−1
100
SNR
Fra
me
Err
or R
ate
FanoZ (b=1)
FanoZ (b=1.5)
FanoM
(b=1.5)
Figure 2: Complexity and Performance of Fano decoder with ZF-DFE andMMSE-DFE based preprocessing for a 30× 30 4−QAM V-BLAST system 32
0 2 4 6 8 10 12 14 16 180
2
4
6
8
10
12
14
bias, b
Exp
ecte
d co
mpl
exity
per
dim
ensi
on
0 2 4 6 8 10 12 14 16 1810
−3
10−2
10−1
100
bias, b
Ave
rage
Err
or R
ates
Frame Error RateBit Error Rate
Figure 3: Complexity and Performance of Fano decoder with different bias, for a20× 20 4−QAM V-BLAST system with ZF-DFE preprocessing
33
0 2 4 6 8 10 1210
−5
10−4
10−3
10−2
10−1
100
Eb/N
0 (dB)
Ave
rage
bit
erro
r ra
te
M=N=4, R=8 bits/channel use (4−QAM)
ML decodingMMSE−DFE Babai decoderYWWF (or Babai) decoder
Figure 4: MMSE-Babai point decoder: Performance of MMSE-DFE preprocessingwith DFE for a 4× 4, 4−QAM V-BLAST system
34
22 23 24 25 26 27 28 29 3010
−3
10−2
10−1
100
SNR (dB)
Blo
ck e
rro
r ra
te
3 Tx, 1 Rx, 6 bits/channel use, TAST codes
MMSE−DFE lattice decoding of rate−1 TAST codeMMSE−DFE lattice decoding of rate−3 TAST codeML decoding of rate−1 TAST codeML decoding of rate−3 TAST code
Figure 5: Under-determined systems: Performance of TAST codes underMMSE-DFE lattice decoding and ML detection with M = 3 and N = 1.
35
0 2 4 6 8 10 12 14 16 18 2010
−3
10−2
10−1
100
SNR (dB)
Fra
me
erro
r ra
te
MMSE−lattice decoding of algebraic STC with Golay(12,24) code, M=2, T=12, N=1, R=1 bpcu
ML decodingMMSE−lattice decoding
0 2 4 6 8 10 12 14 16 1810
−5
10−4
10−3
10−2
10−1
100
SNR (dB)
Fra
me
erro
r ra
te
MMSE−lattice decoding of Hammons−ElGamal stacking construction STC, M=T=3, N=1, R=1 bpcu
ML decodingMMSE−lattice decoding
Figure 6: Construction A, algebraic space-time codes: Performance of MMSE-DFElattice decoding and ML decoding for algebraic space-time codes
36
0 1 2 3 4 5 6 7 8 9 1010
−5
10−4
10−3
10−2
10−1
100
Eb / N
0
Fra
me
Err
or R
ate
FER of PSP 4−state,Fano 4−state and Fano 1024−state
PSP, 4−stateFano, 4−stateFano, 1024−state
0 1 2 3 4 5 6 7 8 9 1010
−7
10−6
10−5
10−4
10−3
10−2
10−1
100
Eb / N
0
Bit
Err
or R
ate
BER of PSP 4−state, Fano 4−state and Fano 1024−state
PSP 4−stateFano 4−stateFano 1024−state
Figure 7: Construction A, ISI-channels: Frame and Bit Error Rate curves for theFano and PSP algorithms for convolutional codes over the ISI channel with responsechosen as (0.848,−0.424, 0.2545,−0.1696, 0.0848).
37
Conclusions
• A unified framework for tree search decoding in wireless communicationapplications.
• Identified the roles of two different, but inter-related, components of the decoder,namely; 1) Preprocessing and 2) Tree Search.
• MMSE-DFE filtering (left preprocessing) and lattice reduction with column greedyordering (right preprocessing) allows for near-optimal lattice decoding.
• By relaxing the boundary control, build a generic framework for designing treesearch strategies for joint detection and decoding.
• BeFS are the most efficient tree search algorithms and Iterative BeFS (Fano)tradeoff complexity for memory. Through analytical and numerical results, wehave shown that the proposed framework solve many communication problemswith reduced complexity and near-optimal performance.
38
Thank You
39