BCJR Material Text

Embed Size (px)

Citation preview

  • 7/30/2019 BCJR Material Text

    1/81

    REDUCED-COMPLEXITY ALGORITHMS FOR DECODING AND

    EQUALIZATION

    A Dissertation

    Submitted to the Graduate School

    of the University of Notre Dame

    in Partial Fulfillment of the Requirements

    for the Degree of

    Doctor of Philosophy

    by

    Marcin Sikora, M.S.

    Daniel J. Costello, Jr., Director

    Graduate Program in Electrical Engineering

    Notre Dame, Indiana

    May 2008

  • 7/30/2019 BCJR Material Text

    2/81

    This document is in the public domain.

  • 7/30/2019 BCJR Material Text

    3/81

    REDUCED-COMPLEXITY ALGORITHMS FOR DECODING AND

    EQUALIZATION

    Abstract

    by

    Marcin Sikora

    Finite state machines (FSMs) and detection problems involving them are fre-

    quently encountered in digital communication systems for noisy channels. One

    type of FSM arises naturally in transmission over band-limited frequency-selective

    channels, when bits are modulated onto complex symbols using memoryless map-

    per and passed through a finite impulse response (FIR) filter. Another type of

    FSMs, mapping sequences of information bits into longer sequences of coded bits,

    are the convolutional codes. The detection problem for FSMs, termed decoding inthe context of convolutional codes and equalization for frequency-selective chan-

    nels, involve either finding the most likely input sequence given noisy observations

    of the output sequence (hard-output decoding), or determining a posteriori prob-

    abilty of individual information bits (soft-output decoding). These problems are

    commonly solved either running a search algorithm on the tree representation of

    all FSM sequences or by means of dynamic programming on the trellis represen-

    tation of the FSM.

    This work presents novel approaches to decoding and equalization based on

    tree search. For decoding of convolutional codes, two novel supercode heuristics

    are proposed to guide the search procedure, reducing the average number of visited

  • 7/30/2019 BCJR Material Text

    4/81

    Marcin Sikora

    incorrect nodes. For soft-output decoding and equalization, a new approach to

    the generation of soft output within the M-algorithm-based search is presented.

    Both techniques, when applied simultaneously, yield a particularly efficient soft

    output decoder for large-memory convolutional codes. Finally, a short block code

    is presented, which repeated and concatenated with strong outer convolutional

    code yields an iteratively-decodable coding scheme with excellent convergence

    and minimum distance properties. With the help of the proposed soft output

    decoder for the outer convolutional code, this concatenation has also low decoding

    complexity.

  • 7/30/2019 BCJR Material Text

    5/81

    CONTENTS

    FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

    TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

    ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

    CHAPTER 1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . 11.1 Problem background . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Contribution and outline . . . . . . . . . . . . . . . . . . . . . . . 4

    CHAPTER 2: SUPERCODE HEURISTICS FOR TREE SEARCH DE-CODING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.1.1 Tree search decoding . . . . . . . . . . . . . . . . . . . . . 7

    2.1.2 The algorithms A and A* . . . . . . . . . . . . . . . . . . 82.1.3 Supercode heuristics . . . . . . . . . . . . . . . . . . . . . 10

    2.2 Construction and trellis representation of supercodes . . . . . . . 122.3 Supercode A*-type heuristic for ML decoding . . . . . . . . . . . 142.4 Supercode A-type heuristic for sub-ML decoding . . . . . . . . . . 152.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    CHAPTER 3: SOFT-OUTPUT EQUALIZATION WITH THE M-BCJRALGORITHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Communication system . . . . . . . . . . . . . . . . . . . . . . . . 243.3 SISO equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.3.1 The BCJR algorithm . . . . . . . . . . . . . . . . . . . . . 273.3.2 The RS-BCJR algorithm . . . . . . . . . . . . . . . . . . . 283.3.3 The M-BCJR algorithm . . . . . . . . . . . . . . . . . . . 30

    ii

  • 7/30/2019 BCJR Material Text

    6/81

    3.4 The M-BCJR algorithm . . . . . . . . . . . . . . . . . . . . . . . 313.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    CHAPTER 4: SERIAL CONCATENATIONS WITH SIMPLE BLOCK IN-NER CODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2 Soft-output decoding of the GSPC code . . . . . . . . . . . . . . . 414.3 Bounds on ML performance of SCCs with an inner GSPC code . . 43

    4.3.1 An idealized interleaver . . . . . . . . . . . . . . . . . . . . 454.3.2 A uniform interleaver . . . . . . . . . . . . . . . . . . . . . 464.3.3 Comparison with simulation results . . . . . . . . . . . . . 47

    4.4 EXIT chart analysis for GSPC codes . . . . . . . . . . . . . . . . 474.5 Design examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    CHAPTER 5: SOFT-OUTPUT DECODING OF CONVOLUTIONAL CODESWITH THE M-BCJR ALGORITHM . . . . . . . . . . . . . . . . . . 535.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.2 The communication system . . . . . . . . . . . . . . . . . . . . . 565.3 The M*-BCJR algorithm . . . . . . . . . . . . . . . . . . . . . . . 58

    5.3.1 Algorithm description . . . . . . . . . . . . . . . . . . . . 585.3.2 Impact of survivor selection on the performance of M*-BCJR 59

    5.4 Survivor selection with supercode heuristic . . . . . . . . . . . . . 625.4.1 Supercode heuristic . . . . . . . . . . . . . . . . . . . . . . 625.4.2 Construction and trellis representation of supercodes . . . 62

    5.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    iii

  • 7/30/2019 BCJR Material Text

    7/81

    FIGURES

    2.1 Average number of path extensions per coded bit performed by thestack algorithm with the ML supercode heuristic hS. . . . . . . . 20

    2.2 Average number of path extensions per coded bit performed by thestack algorithm with the sub-ML supercode heuristic hS. . . . . . 20

    3.1 Communication system with turbo equalization. . . . . . . . . . . 253.2 Part of the system to be soft-inverted by the SISO equalizer. . . 25

    3.3 Trellis section a) before and b) after merging an excess state si intoa surviving state si. . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.4 Bit error rate of M-BCJR and RS-BCJR for a) scenario 1 (BPSK)and b) scenario 2 (16QAM). . . . . . . . . . . . . . . . . . . . . . 35

    3.5 Number of states vs. Eb/No to reach the reference Pe for a) scenario1 (BPSK, Pe = 10

    4) and b) scenario 2 (16QAM, Pe = 103). . . 36

    4.1 Serially concatenated coding with an inner block code. . . . . . . 40

    4.2 Generalized single parity check encoder. . . . . . . . . . . . . . . 40

    4.3 Trellis for soft-output decoding of the generalized parity check code. 444.4 Comparison of simulation results and BER bounds for the 2-state,

    rate 1/2 outer convolutional code. . . . . . . . . . . . . . . . . . . 48

    4.5 Dependence between the GSPC code parameters and the EXITcurve shape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    4.6 EXIT charts for a) SCC 1 with 16-state outer code, b) SCC 2 with256-state outer code. . . . . . . . . . . . . . . . . . . . . . . . . . 51

    4.7 BER performance and uniform interleaver limit for a) SCC 1 with

    16-state outer code, b) SCC 2 with 256-state outer code. . . . . . 52

    5.1 Serial concatenation of strong outer CC and inner GSPC. . . . . . 57

    5.2 Performance of an iteratively decoded SCC with (2,1,8) outer CCdecoded using standard M*-BCJR after 8 iterations. . . . . . . . . 61

    iv

  • 7/30/2019 BCJR Material Text

    8/81

    5.3 Performance of the M*-BCJR algorithm aided by a genie in survivorselection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    5.4 Performance of the M*-BCJR algorithm with the supercode heuris-tic for a) M = 16 and b) M = 32. . . . . . . . . . . . . . . . . . . 66

    v

  • 7/30/2019 BCJR Material Text

    9/81

    TABLES

    2.1 SUPERCODE PARAMETERS . . . . . . . . . . . . . . . . . . . 19

    3.1 SIMULATED TURBO-EQUALIZATION SCENARIOS . . . . . . 34

    vi

  • 7/30/2019 BCJR Material Text

    10/81

    ACKNOWLEDGMENTS

    First I would like to sincerely thank my parents and family back home in

    Poland. Without their love, inspiration, and constant support for my actions and

    endeavors, my Ph.D. studies at Notre Dame and this thesis would never come to

    be.

    I would like to express my gratitude to my advisor, Professor Daniel Costello,

    for his guidance and support throughout my studies. It was his encouragement,

    insight, and a clever use of deadlines that each time led me from the desert of

    failed ideas and self doubt to the promised land of novelty and excitement. He

    has made a contribution to me both as a researcher and as a person and words

    cannot express the thanks sufficiently.

    I am thankful to Professors Nicholas Laneman, Martin Haenggi, Thomas Fuja,

    and Oliver Collins for support and helpful discussion throughout my studies. Fi-

    nally I would like to thank to my friends at Notre Dame, especially Ali Pusane,

    Christian Koller, and Faruck Morcos, for making my stay in South Bend an un-

    forgettable experience.

    vii

  • 7/30/2019 BCJR Material Text

    11/81

    CHAPTER 1

    INTRODUCTION

    1.1 Problem background

    The amount of information that our world digitally gathers, transfers, stores,

    and processes each day is rapidly growing. As we push the physical limits of fiber

    optic cables, radio spectrum, magnetic disks, and silicon memory, these storage

    and communication media become increasingly unreliable, corrupting the informa-

    tion with random noise and complex distortions. The goal of the communication

    theory is the design of communication systems that allow reliable recovery of the

    original information, represented as a sequence of bits, with complexity, latency,

    and chance of error kept as small as possible.

    Within such systems, the transmitter is responsible for performing the map-

    ping from the set of all possible information sequences into a set of bits, symbols,

    or waveforms suitable for the transmission over the channel, while the receiver

    establishes the probable information sequence based on the observed channel out-

    put. The overall reliability of communication, measured in terms of average rate

    of erroneously recovered bits (bit error rate, BER), depends on both these ele-ments. If the mapping implemented by the transmitter does not separate distinct

    sequences far enough in the signal space, the channel is likely to make the received

    signal appear more similar to an incorrect signal than the actually transmitted one.

    1

  • 7/30/2019 BCJR Material Text

    12/81

    Additional errors arise if the receiver algorithm does not perform the maximum

    likelihood (ML) detection, but rather its approximation.

    It is not difficult to find good transmitter mappings which, combined with an

    ML receiver, provide very reliable transmission. In fact, in his landmark paper

    [29], Shannon demonstrated that mappings generated randomly have a very high

    chance of being very reliable. It is much more difficult to find ones with transmit-

    ters and receivers of reasonable complexity. Since, in general, the computational

    complexity of the receiver is larger than that of the transmitter, it is the availability

    of efficient detection algorithms that ultimately limits the achievable performance.

    This fact is the main motivation for development of new reduced-complexity detec-

    tion algorithms which, at the same computational cost, can handle more complex

    detection tasks, leading to an improved BER.

    The main results presented in this thesis are novel reduced-complexity tech-

    niques for solving the detection problems involving finite state machines (FSMs)

    with noise-corrupted outputs. The two main cases when such FSMs arise in

    communication systems are convolutional codes (CC) [7, 19] and inter-symbol in-

    terference (ISI) channels [26]. The detection problems for CCs will be referred to

    as decoding while those for ISI channels as equalization. While quite different in

    origin, both these FSMs can be decoded and equalized using very similar algo-

    rithms. This is because the sets of all possible output sequences of a FSM can be

    combinatorically represented by a tree or trellis.

    There are two important detection problems associated with FSMs are max-

    imum likelihood sequence detection (MLSD) and a posteriori probablity (APP)

    computation. The former involves finding the most likely input sequence (or out-

    put sequence) given the sequence of channel outputs and is often referred to as

    2

  • 7/30/2019 BCJR Material Text

    13/81

    hard-output detection. Hard-output decoding is particularly desirable if the re-

    sulting sequence of information bits is the immediate input to the higher layers of

    the communication systems, as MLSD guarantees minimum probability of block

    error. However, MLSD does not provide any reliability information about individ-

    ual bits in the sequence, a disadvantage in concatenated system where the detector

    output is fed as input to an outer decoder. In such cases the computation of APP

    for each individual bit in the information sequence, termed soft-output detection

    [13], is more advantageous. Soft-output decoders are the central element of turbo

    decoders [4], in which parallel or serial concatenations of interleaved codes are

    decoded by iteratively performing soft-output decoding of the component codes.

    Similarly, soft-output equalizers can be used to provide reliability information to

    the decoders or for turbo-equalization [18].

    1.2 Previous work

    The classical solutions to the FSM detection problems can be divided into

    trellis- and tree-based algorithms. The Viterbi [39] and the Bahl-Cocke-Jelinek-Raviv (BCJR) [2] algorithms use the trellis representation of the set of all pos-

    sible input-output FSM sequence mappings and dynamic programming to solve

    the MLSE and APP problems, respectively. They are guaranteed to obtain their

    solutions in a fixed amount of time independently of the noise level, but their

    complexity scales linearly with the size of the trellis, which is exponential in the

    memory length of the FSM. The Viterbi and BCJR algorithms have been sucess-

    fully used with CCs and ISIs with trellis sizes up to couple of hundred, or in rare

    cases couple of thousands states per time unit, but their application to trellises

    with more than a hundred states is rarely justifiable in terms of complexity.

    3

  • 7/30/2019 BCJR Material Text

    14/81

    The tree-search algorithms are well-suited for solving the MLSE problem and

    an entire class of sequential algorithms [1] has been developed. The most notable

    examples are the Fano algorithm [8], the stack algorithms [17, 41], and the M-

    algorithm [1]. All these techniques attempt to find the ML sequence by searching

    through the tree of partial sequences, only following the most promising paths.

    As a result the decoding time is variable and can be much shorter than that of

    the Viterbi algorithm at high signal-to-noise ratios (SNRs), even for FSMs with

    large memory. At low SNRs, however, the decoding time of sequential algorithms

    rapidly increases, limiting their utility.

    Just as the sequential decoding can be perceive as a reduced-complexity al-

    ternative to Viterbi decoding, there have been attempts to develop less complex

    algorithms capable of generating soft outputs. The M-BCJR [9] and the LISS algo-

    rithms [12] use partial paths generated by a sequential procedure to generate soft

    outputs. The Soft-Output Viterbi Algorithm (SOVA) [11] augments the Viterbi

    algorithm to obtain bit reliability information. Reduced-State BCJR (RS-BCJR)

    [5], applicable only to the equalization problem, executes the BCJR algorithm on

    a trellis with shortened FSM memory. The BEAST algorithm [21] generates soft

    output from the list of several most likely input sequences. The price for the com-

    plexity reduction provided by these techniques is lower quality of the soft outputs,

    which causes higher overall BER and negatively impacts the convergence of turbo

    decoding methods.

    1.3 Contribution and outline

    This work describes novel hard-output and soft-output detection techniques for

    FSMs which operate at lower computational complexities than previous methods.

    4

  • 7/30/2019 BCJR Material Text

    15/81

    The first of these methods, presented in Chapter 2, extends the classical sequen-

    tial decoding to encompass heuristic tree search. The use of appropriate path cost

    heuristics can lead to faster tree search at a cost of an additional fixed computa-

    tional burden. A particularly effective mechanism for obtaining such heuristics,

    based on the concept of a supercode, leads to overall complexity reduction of

    hard-output decoding of convolutional codes at low SNR.

    Chapter 3 presents a soft-output algorithm called M-BCJR, partially based

    on the sequential M-algorithm. The M-BCJR algorithm dynamically builds a

    reduced-complexity trellis with low number of states and executes the BCJR al-

    gorithm on it. The main novelty of this method is the trellis construction process,

    that utilizes specially designed state absorbing operation and state distance met-

    ric. The algorithm is particularly well suited for turbo equalization, offering better

    performance-complexity tradeoff than existing methods. The limitations of M-

    BCJR as a soft-output decoder for CCs with larger memory are explored and

    subsequently remedied in Chapter 5. It is shown that the primary preformance

    limitation is the inaccurate survivor state selection during the construction of

    the reduced trellis. It is further shown that the supercode heuristic introduced

    in Chapter 2 can be sucessfully used to improve the selection accuracy, hence

    improving the quality of soft outputs.

    The results pertaining to soft-output decoding of large memory CCs are pre-

    ceded by Chapter 4, which discusses applicability of such codes as components of

    turbo codes. Traditionally, turbo codes are constructed from convolutional codes

    with short memory, allowing efficient BCJR decoding and generally leading to

    good convergence of the iterative decoding. However, when shorter block lengths

    are required, such turbo codes can have relatively poor error floor performance

    5

  • 7/30/2019 BCJR Material Text

    16/81

    due to low minimum distance. Chapter 4 presents a serial concatenation of an

    outer strong CC and an inner novel block code, which assures good minimum dis-

    tance and excellent convergence. Furthermore, this scheme greatly benefits from

    the application of the M-BCJR algorithm to decoding of the outer code.

    6

  • 7/30/2019 BCJR Material Text

    17/81

    CHAPTER 2

    SUPERCODE HEURISTICS FOR TREE SEARCH DECODING

    2.1 Introduction

    2.1.1 Tree search decoding

    After their invention by Elias [7], the first practical approach to decoding

    convolutional codes (CC) was a tree search technique, due to Wozencraft [40].

    Wozencrafts decoder attempts to follow the most promising paths through the

    tree of partial codewords until a complete codeword is found, and it achieves

    this by performing two primary tasks: calculating path metrics for visited partial

    codewords and generating new paths by extending old paths with high metrics.

    This tree search principle, acompanied by a requirement that the metric for paths

    of length L (in coded bits) only depends on the first L received channel symbols,

    is refered to as sequential decoding. Despite many improvements to Wozencrafts

    original algorithm proposed in the literature, both to the path metric computation

    [8] and the path extension procedure [1, 8, 17, 41], the ultimate limit to sequential

    decoding is the computational cutoff rate, a rate above which the average number

    of visited paths (and hence computations) is unbounded. This limitation precludes

    sequential decoding from practical operation at rates close to capacity. However,

    as we demonstrate in this paper, this limitation need not apply to tree search

    decoding in general when path metrics utilize the entire received sequence.

    7

  • 7/30/2019 BCJR Material Text

    18/81

    2.1.2 The algorithms A and A*

    One of the best understood tree search decoding algorithms is the Zigangirov-

    Jelinek stack algorithm [17, 41], which is a special case of the best-first algorithm

    A for informed tree search studied in the computer science literature [16, 24, 25,

    28]. When the A algorithm is applied to a shortest (or longest) path problem, the

    above references offer a standard approach to designing path metrics. In terms of

    the maximum likelihood (ML) decoding problem, defined as

    xML = arg maxxC

    N

    i=1log2 P(yi|xi), (2.1)

    the metric for a partial codeword x1,L should have the general form

    f(x1,L) = g (x1,L) + h (x1,L) , (2.2)

    where the first term measures the path cost accumulated so far,

    g (x1,L) =

    Li=1

    log2 P(yi|xi), (2.3)

    and the second term is an heuristic estimate for the remaining cost of reaching

    the end of the tree if the best extension of x1,L has been followed, i.e.,

    h (x1,L) hideal (x1,L) , (2.4)

    hideal (x1,L) = maxxC(x1,L)

    Ni=L+1

    log2 P(yi|xi), (2.5)

    where C(x1,L) is the set of codewords in C beginning with x1,L. Depending onthe accuracy and properties of the heuristic function h (x1,L), algorithms with

    8

  • 7/30/2019 BCJR Material Text

    19/81

    different computational complexities and probabilities of missing the ML path can

    be obtained. In particular, if the exact value of hideal is used as a heuristic, the

    algorithm A is guaranteed to find the ML codeword in N steps, i.e., it will never

    diverge from the ML path. A practical heuristic, of course, should trade off some

    of the accuracy for ease of computation, such that the joint computational effort

    of obtaining the path metrics and searching the tree is as small as possible. The

    authors of [25, 28] suggest that the general approach to obtaining good heuristics

    is replacing the original problem defining hideal by relaxed versions that are simpler

    to solve. The supercode heuristic introduced in this paper adheres to this idea.

    A much celebrated variant of the algorithm A with path metric (2.2), called

    A*, is obtained when the heuristic function is not just an approximation, but an

    upper bound, to (2.5), i.e.,

    h (x1,L) hideal (x1,L) . (2.6)

    The above condition guarantees that A* always finds the ML solution to the

    decoding problem. However, the optimality guarantee offered by A* is very costly

    in terms of the number of paths visited by the search algorithm and heuristics

    of similar complexity that aim at satisfying (2.4) rather than (2.6) conclude the

    search much faster. The ML guarantee is also not essential for practical decoding,

    as long as the decoding errors caused by missing the ML solution are less frequent

    than the errors due to the actually transmitted codeword not being ML.

    Our reason for mentioning the A* algorithm is the fact that, as we demon-strate in subsequent sections, our supercode approach naturally leads to heuristics

    satisfying (2.6). Although these heuristics considerably reduce the number of com-

    putations compared to previous A* decoding metrics proposed in the literature

    9

  • 7/30/2019 BCJR Material Text

    20/81

    [15], they still compare unfavorably to the standard Fano metric [8], which does

    not satisfy (2.6). Similar observations have been made by other authors, who ap-

    plied the concept of the A and A* algorithms to the sequential decoding problem

    [14, 34]. The main result of this paper is a supercode heuristic that generalizes the

    Fano metric, sacrificing (2.6) for a considerable reduction in the number of visited

    paths and offering practical decoding at rates inaccessible to previously proposed

    sequential algorithms.

    2.1.3 Supercode heuristics

    The heuristics proposed in this paper involve solving the maximization (or

    a problem similar to maximization) of (2.5) for a supercode S(x1,L) of the codeC(x1,L) (i.e., S(x1,L) C(x1,L)), rather than for the code C itself. Before we discusswhen and why such a problem would be simpler, or even feasible, we first present

    a practical way to implement the exact hideal for a terminated convolutional code.

    Examination of (2.5) reveals that hideal(x1,L) is only a function of the final state

    sL of the encoder that outputs x1,L, and thus the total number of unique values of

    hideal(sL) is equal to the total number of states in the trellis representation of C.Therefore, instead of computing hideal(x1,L) each time the tree search procedure

    visits a path x1,L, values of hideal(sL) can be precomputed for all states before

    the actual tree search begins. This precomputation can be performed by first

    setting the metric of the final trellis state hideal(sN+1) to zero and then recursively

    computing the values for earlier states through add-compare-select operations,

    i.e.,

    hideal (sL) = max(sL,xL,sL+1)

    (log2 P(yL|xL) + hideal (sL+1)) , (2.7)

    10

  • 7/30/2019 BCJR Material Text

    21/81

    where the maximization is performed over all trellis branches starting at sL. This

    procedure can be immediately recognized as the Viterbi algorithm run backwards

    (that stores state metrics rather than survivor information, since it solves the max

    rather than arg max problem). If we additionally recall that algorithm A with an

    ideal metric never diverges from the optimal path, acting just like the backtracking

    procedure of the Viterbi algorithm ran in the forward direction, we can conclude

    that algorithm A with an ideal heuristic precomputed using (2.7) is equivalent to

    the (inverted) Viterbi algorithm. We will subsequently refer to the ideal heuristic

    precomputed in such way as the Viterbi heuristic. Moreover, we will call such a

    precomputation task a backward step, and the subsequent tree search a forward

    step.

    The actual practicality of the Viterbi heuristic is limited by the number of

    states in the code trellis, which for convolutional codes grows exponentially with

    the constraint length. Our approach, detailed in subsequent sections, relies on

    finding supercodes that have a trellis representation with a lower number of states

    than the original code. We present one such technique for codes of rate R

    1/2,

    based on deleting rows from the parity check matrix of the original code, in Section

    2. In Section 3 we demonstrate how solving the likelihood maximization problem

    over a supercode naturally leads to an upper bound on hideal, yielding an A*-type

    heuristic. In Section 4 we further apply the concept of a supercode to generalize the

    Fano metric and obtain a robust non-A* heuristic. We illustrate the performance

    of the proposed techniques using a memory 8, rate 1/2 convolutional code as an

    example in Section 5 and draw some concusions in Section 6.

    11

  • 7/30/2019 BCJR Material Text

    22/81

    2.2 Construction and trellis representation of supercodes

    Consider a rate k/n non-recursive convolutional code with total encoder mem-

    ory k. If we terminate this code after K = k information bits, we will obtain alinear block code with block length N = n( + ). The set of all codewords can

    then be defined either using the generator matrix G, i.e.,

    C = x GF(2)N|x = uG for some u GF(2)K , (2.8)or using the parity check matrix H, i.e.,

    C = x GF(2)N|HxT = 0 . (2.9)The trellis representation of C can be obtained from either of these definitions,although typically the generator matrix is used, since the resulting trellises ad-

    ditionally store the encoder mapping from u to x. However, we found that for

    the purpose of obtaining and representing supercodes, the parity check matrix is

    more useful.

    Ifhj is the j-th row of the parity check matrix H then the j-th parity check

    equation is hjxT = 0. We will say that the parity check hj is active at time L if

    both hj1,L and hjL+1,N contain nonzero entries, and we use J(L) to denote the set

    of such parity checks. For a given codeword x we can define a syndrome at time

    L as a |J(L)|-tuple

    sL =h

    j

    1,LxT

    1,LjJ(L) . (2.10)

    The possible values that can be taken by the syndrome sL correspond exactly to

    the states at time L in the parity check trellis. For the terminated convolutional

    12

  • 7/30/2019 BCJR Material Text

    23/81

    code considered here, the number of parity checks active at times L = n, 2n, 3n, ...

    is at most (n k) and the number of parity check trellis states does not exceed2(nk). It is worth pointing out that, since the analogous trellis based on (2.8)

    has 2k states, the parity check trellis is less efficient at representing codes with

    rates R < 1/2.

    Let us now suppose that a fraction of the parity checks ofH have been om-

    mited. IfP is the set of parity check rows that have not been deleted, then theresulting new linear block code has the form

    SP = x GF(2)N|hjxT = 0 for all j P . (2.11)Clearly SP C. Moreover, the parity check trellis ofSP has fewer states than theoriginal trellis, since only |P J(L)| parity checks are active at time L. In themost extreme case ofP= , S = GF(2)N and the parity check trellis has onlyone state at each time index L.

    In the case of convolutional codes , it is desirable that the number of trellis

    states of the supercode does not vary with time. If we aim at constructing a

    supercode that has at most 2M states at times L = n, 2n, 3n, ..., the set Pcannotinclude more than M out of every consecutive set of (n k) parities, i.e.,

    L , . . . , L + (n k) 1 P Mfor all L = 1, . . . , (n k). (2.12)

    The easiest way to construct parity check sets with this property is to make them

    repetitive with period (n k). In particular, we can initialize Pto an arbitraryM-element subset of

    1, . . . , (nk) and then keep adding elements j + (nk)

    13

  • 7/30/2019 BCJR Material Text

    24/81

    whenever j is already in the set. Since in this construction the initial M-element

    subset entirely defines P, it is convenient to represent Pby a binary (nk)-tuplep, with pj = 1 ifj Pand pj = 0 otherwise.

    2.3 Supercode A*-type heuristic for ML decoding

    The concept of supercode presented in the previous section naturally leads to

    the following heuristic for tree search decoding:

    hS (x1,L) = maxxS(x1,L)

    N

    i=L+1log2 P(yi|xi). (2.13)

    By comparing (2.13) to (2.5), it is immediately clear that hS satisfies the A*

    condition (2.6), ensuring ML decoding. Similar to the implementation of hideal

    discussed in the introduction, the decoder will first precompute all unique values

    of hS by performing a Viterbi-like backward pass on the trellis representing S.During the forward pass, the tree search procedure must be able to easily map

    from the current path x1,L to an appropriate state in the trellis forS

    to facilitate

    the metric lookup operation. For supercodes SP obtained by the parity checkdeletion described in the previous section, this task is accomplished by

    x1,L sP,L =hj1,Lx

    T1,L

    jJ(L)P

    , (2.14)

    which need not be performed anew for each new path, but can be recursively

    updated during path extension.

    14

  • 7/30/2019 BCJR Material Text

    25/81

    In the extreme case ofS = GF(2)N, the heuristic (2.13) simplifies to

    hS (x1,L) =N

    i=L+1

    maxxi{0,1}

    log2 P(yi|xi). (2.15)

    If we consider further the entire path metric fS(x1,L) obtained from (2.2) and

    (2.15) and subtract a term hS () independent ofx1,L, we obtain the equivalent

    metric

    f0 (x1,L) =

    Li=1

    log2 P(yi|xi) max

    xi{0,1}log2 P(yi|xi)

    . (2.16)

    In this form, the overall path metric does not depend on the received symbols

    beyond position L, and thus tree search decoding utilizing this metric can be

    regarded as a sequential decoder. Metric (2.16) is in fact equivalent to the one used

    in the Maximum Likelihood Sequential Decoding Algorithm (MLSDA) presented

    in [15]. At the other extreme, ifS= C is used, the metric (2.13) yields the Viterbialgorithm, as described in the introduction.

    2.4 Supercode A-type heuristic for sub-ML decoding

    Despite guaranteeing ML decoding and a faster tree search compared to the

    MLSDA, the A* heuristic presented in the previous section requires many more

    node extensions than the simple, non-A* Fano metric, unless the supercode used

    is only minimally simpler than the original code. This observation indicates that,

    at the price of a very small degradation in bit error rate (BER) performance, a

    significant computational savings can be achieved by following rule (2.4) rather

    than (2.6). By using the Fano metric as a starting point for our derivation, we

    can obtain an attractive sub-ML supercode heuristic with the same complexity as

    the A* variant, but leading to a much faster tree search.

    15

  • 7/30/2019 BCJR Material Text

    26/81

    The Fano metric, when used in sequential decoders, usually appears in the

    form

    fFano (x1,L) =

    L

    i=1log2

    P(yi|xi)P(yi)

    R . (2.17)By adding terms independent ofx1,L and writing out P(yi) using the total prob-

    ability rule, we obtain the equivalent metric

    fFano (x1,L) =Li=1

    log2 P(yi|xi)

    +N

    i=L+1

    log2

    xi{0,1}

    P(yi|xi)2

    + R

    , (2.18)

    which conforms to the general form (2.2) of the algorithm A metric. The heuristic

    part of (2.18) can be further rewritten as

    hFano (x1,L) = log2

    xS(x1,L)

    Ni=L+1 P(yi|xi)

    |S(x1,L)|

    + log2

    |C(x1,L)

    |, (2.19)

    which reveals an embedded averaging operation over the set of all possible binary

    sequences starting with x1,L. The presence of such averaging is not surprising,

    since the Fano metric was derived for random codes. In our attempt to generalize

    the Fano metric, we now propose to change the domain of this averaging from Sto one of the supercodes defined in section 2. We might interpret this change as

    viewing the code C as a random code with codewords drawn from Srather thanfrom S. Performing such a substitution in (2.19) and additionally splitting the

    16

  • 7/30/2019 BCJR Material Text

    27/81

    terms involving the set cardinalities, we obtain

    h1S (x1,L) = max

    xS(x1,L)

    N

    i=L+1

    log2 P(yi|xi) RS + R, (2.20)where RS is the rate of the supercode Sand the max operation is defined as

    maxiI

    bi log2iI

    2bi . (2.21)

    Just as in the case of the supercode heuristic presented in Section 3, h1S can only

    assume as many unique values as there are states in the trellis of

    S. Futhermore,

    it can be precomputed recursively by

    h1S (sL) =

    max(sL,xL,sL+1)

    log2 P(yL|xL) RS + R + h1S(sL+1)

    , (2.22)

    which makes this task identical to the backward step of the BCJR algorithm

    (except for the bias term RS + R).The heuristic h1S can be further fine-tuned to account for the fact that when

    we use it with S = C, it does not reduce to the Viterbi metric. The differencelies in the summarizing operation, which for the Viterbi metric is max and for

    h1S is max. To allow for a smooth transition between these two operations, we

    introduce

    hS (x1,L) = max

    xS(x1,L)

    N

    i=L+1

    log2 P(yi|xi) RS + R, (2.23)

    17

  • 7/30/2019 BCJR Material Text

    28/81

    where the generalized max is defined as

    maxiI

    bi log2iI 2bi/. (2.24)

    Just like max and max, the max operation is commutative and associative,

    and the multiargument version can be obtained by successive application of the

    two-argument version, given as

    max(b1, b2) = log2

    2b1/ + 2b2/

    = max(b1, b2) + log2 1 + 2|b1b2|/ ,which is used in the backward pass each time several branches leave the same

    state. By varying the parameter between 0 and 1, max becomes more similar

    to one or the other operation. Unfortunately, other than using = 1 for S= Sand = 0 for S= C, we have so far been unable to devise an analytical methodof choosing . We have observed, however, that the parameter can be used to

    control the tradeoff between the speed of the tree search and the BER. In the

    following section we selected s by trial and error to achieve a desired BER.

    2.5 Simulation results

    We have examined the performance of the proposed heuristics for decoding

    a (2,1,8) convolutional code G = (457, 755)o terminated after 2048 information

    bits and used for transmission over the binary-input Additive White Gaussian

    Noise channel. The supercodes, defined by their parity check deletion patterns

    p introduced in Section 2, which were used in conjunction with the hS and hS

    heuristics, are listed in Table 2.1.

    18

  • 7/30/2019 BCJR Material Text

    29/81

    TABLE 2.1

    SUPERCODE PARAMETERS

    Deletion pattern p Rate RS Trellis states Correction

    (1, 0, 0, 0, 1, 0, 0, 0) 14/16 4 0.955

    (1, 0, 0, 1, 0, 0, 1, 0) 13/16 8 0.945

    (1, 0, 1, 0, 1, 0, 1, 0) 12/16 16 0.94

    (1, 1, 0, 1, 1, 0, 1, 0) 11/16 32 0.9

    (1, 1, 1, 0, 1, 1, 1, 0) 10/16 64 0.75

    (1, 1, 1, 1, 1, 1, 1, 0) 9/16 128 0.0

    The number of node extensions performed by the stack algorithm with the hS

    heuristic for a range of energies per information bit over noise spectral densities

    Eb/N0 is presented in Fig. 2.1. The figure demonstrates the tradeoff between the

    complexity of the supercode (measured in the number of trellis states per trellis

    segment) and the complexity of the tree search. Although using more and more

    complex supercodes leads to significant improvements over MLSDA, it is clear

    that the ML decoding guarantee (given that decoding actually finishes) severely

    penalizes the A* approach compared to the Fano metric.

    Alternatively, when the hS heuristic is employed, significant improvement com-

    pared to the Fano metric can be achieved. This fact is illustrated in Fig. 2.2. The

    particular values of the parameter used for each supercode are included in Table

    1. Since we determined that lower values of generally lead to lower numbers of

    computations but higher BERs, we selected the values to be simulated as the low-

    est s for which the BER performance loss compared to ML decoding is limited

    to 0.1 dB.

    19

  • 7/30/2019 BCJR Material Text

    30/81

    !

    "

    #

    $

    %

    &

    #

    '

    (

    )

    0

    #

    $

    1

    2

    3

    1

    )

    4

    (

    5

    %

    5

    1

    '

    7

    4

    #

    $

    3

    1

    8

    #

    8

    0

    5

    9

    @

    A

    @

    B

    C D E F G

    9

    @

    A

    @

    B

    9

    9

    @

    A

    @

    B

    9

    9

    @

    A

    @

    B

    9

    9

    @

    A

    @

    B

    9

    9

    @

    A

    @

    B

    9

    9

    @

    A

    @

    B

    9

    9

    @

    A

    @

    B

    9

    H

    @

    B I

    P

    Q

    A R S B

    @

    I

    T

    Figure 2.1. Average number of path extensions per coded bit performedby the stack algorithm with the ML supercode heuristic hS.

    U U

    W X X W Y Y W

    U

    W

    X U

    X W

    Y U

    Y W

    U

    a

    b

    c d

    e f

    g

    h i

    p q

    r

    s

    t

    u

    v

    w

    t

    x

    y

    t

    u

    y

    v

    x

    t

    u

    t

    X

    f

    h

    q

    X

    ` Y

    X Y

    Y W

    f

    g

    g q

    Figure 2.2. Average number of path extensions per coded bit performedby the stack algorithm with the sub-ML supercode heuristic hS.

    20

  • 7/30/2019 BCJR Material Text

    31/81

    2.6 Summary

    Despite the limitation of sequential decoding to rates below cutoff rate, general

    tree search decoding can finish decoding in a small number of steps if a reliableheuristic path metric is used. We have presented two such heuristics based on

    the concept of a supercode, both of which can be precomputed on a trellis of the

    supercode before the actual tree search begins. Although this preprocessing step

    requires performing additional computations, many more computations are saved

    during the tree search phase, especially at rates above the cutoff rate.

    21

  • 7/30/2019 BCJR Material Text

    32/81

    CHAPTER 3

    SOFT-OUTPUT EQUALIZATION WITH THE M-BCJR ALGORITHM

    3.1 Introduction

    Efficient communication over channels introducing inter-symbol interference

    (ISI) often requires the receiver to perform channel equalization. Turbo equaliza-

    tion [18] is a technique in which decoding and equalization are performed itera-

    tively, similar to turbo-decoding of serially-concatenated convolutional codes [3].

    As depicted in Figure 3.1, the key element of the receiver employing this method

    is a soft-input soft-output (SISO) demodulator/equalizer (from now on referred

    to as just an equalizer), accepting a priori likelihoods of coded bits from the SISO

    decoder, and producing their a posteriori likelihoods based on the noisy received

    signal.

    The SISO algorithm that computes the exact values of the a posteriori likeli-

    hoods is the BCJR algorithm [2]. The complexity of a BCJR equalizer is propor-

    tional to the number of states in the trellis representing the modulation alphabet

    and the ISI, and thus it is exponential in both the length of the channel impluse

    response (CIR) and in the number of bits per symbol in the modulator. This canbe a serious drawback in some scenarios, e.g., transmission at a high data rate over

    a radio channel, where a large signal bandwidth translates to a long CIR, and a

    high spectral efficiency translates to a large modulation alphabet. Needed in such

    22

  • 7/30/2019 BCJR Material Text

    33/81

    cases are alternative SISO equalizers with the ability to achieve large complexity

    savings at a cost of small performance degradation.

    There have been two main trends in the design of such SISOs. The first one

    relies on reducing the effective length of the channel impulse response, either by

    linear processing (see, e.g., [38]), or interference cancellation via decision feed-

    back. A particularly good algorithm is this category is the reduced-state BCJR

    (RS-BCJR) [5], which performs the cancellation of the final channel taps on a

    per-survivor basis. Iterative decoding with RS-BCJR is very stable, thanks to the

    high quality of the soft outputs, but the receiver cannot use the signal power con-

    tained in the cancelled part of the CIR. Another trend is to adapt hard-output

    sequential algorithms [1] to produce soft outputs. Examples in this category are

    the M-BCJR and T-BCJR algorithms [9], based on the M- and T-algorithms, and

    the LISS algorithm [12] based on list sequential decoding. These algorithms have

    no problem using the signal energy from the whole CIR, and offer much more

    flexibility in choosing the desired complexity. However, their reliance on ignoring

    unpromising paths in the trellis or tree causes a bias in the soft output (there are

    more explored paths with one value of a particular input bit than another), which

    negatively affects the convergence of iterative decoding.

    In this paper we present a new SISO equalization algorithm, inspired by both

    the M-BCJR and RS-BCJR, which shares many of their advantages, but few

    of their weaknesses. We call this algorithm the M-BCJR algorithm, since it

    resembles the M-BCJR in preserving only a fixed number of trellis states with

    the largest forward metric. Instead of deleting the excess states, however, the

    M-BCJR dynamically merges them with the surviving states a process that

    shares some similarity to the static state merging done on a per-survivor basis

    23

  • 7/30/2019 BCJR Material Text

    34/81

    by the RS-BCJR. For the sake of simpler notation, we present the operation of

    all BCJR-based algorithms, including the M-BCJR, in the probability domain.

    Each of them, however, can be implemented in the log domain for better numerical

    stability.

    The rest of the paper is structured as follows. Section 2 describes the commu-

    nication system and the task of the SISO equalizer and introduces the notation.

    Section 3 reviews the structure of the BCJR, M-BCJR, and RS-BCJR algorithms,

    helping us to introduce the M-BCJR in Section 4. Section 5 presents simulation

    results, and conclusions are given in Section 6.

    3.2 Communication system

    A communication system with turbo equalization is depicted in Figure 3.1. The

    information bits are first arranged into blocks and encoded with a convolutional

    code. The blocks of coded bits are permuted using an interleaver and mapped onto

    a sequence of complex symbols by the modulator. (In general, the modulator can

    have memory, but for simplicity we will assume a memoryless mapper.) Thechannel acts as a discrete-time finite impulse response (FIR) filter introducing

    ISI, the output of which is further corrupted by additive white Gaussian noise

    (AWGN). We assume the receiver knows the ISI channel coefficients and the noise

    variance, and it attempts to recover the information bits by iteratively performing

    SISO equalization and decoding.

    The part of the system significant from the point of view of the equalizer is

    shown in Figure 3.2. Let a = (a1, a2,...,aL) denote a sequence ofLK bits entering

    the modulator, arranged into L groups ai = (a1i , a

    2i ,...,a

    Ki ) of K bits. Each K-

    tuple ai selects a complex-valued output symbol xi from a constellation of size 2K

    24

  • 7/30/2019 BCJR Material Text

    35/81

    Convolutional

    EncoderModulator

    SISO

    Demodulator

    / Equalizer

    SISO

    Decoder

    Inter-Symbol

    Interference

    AWGN

    Figure 3.1. Communication system with turbo equalization.

    Modulator ISI(h, , h

    )

    AWGN

    Figure 3.2. Part of the system to be soft-inverted by the SISOequalizer.

    to be transmitted. The sequence of symbols y = (y1, y2,..., yL+S) obtained at

    the receiver is modeled as

    yi =S

    j=0

    hjxij + ni, (3.1)

    where S is the memory of the channel, hj, j = 0, 1,...,S, are the channel coef-

    ficients, and ni, i = 1, 2,...,L + S, are i.i.d. zero-mean complex-valued Gaussian

    random variables with variance 2 per complex dimension. Equation (3.1) assumes

    that xi is zero outside i = 1, 2,...,L.

    The SISO equalizer for the above channel takes the received symbols y andthe a priori log-likelihood ratios La(a

    ki ) for each bit a

    ki , defined as

    La(aki ) = log

    P(aki = +1)

    P(aki = 1), (3.2)

    25

  • 7/30/2019 BCJR Material Text

    36/81

    and outputs the a posteriori L-values L(aki )

    L(aki ) = logP(aki = +1|y)P(aki =

    1

    |y)

    . (3.3)

    The values actually fed to the SISO decoder are extrinsic L-values, computed as

    Le(aki ) = L(a

    ki ) La(aki ).

    Let (a) denote the joint probability that a was transmitted and y was re-

    ceived. Then (3.3) can be expressed as

    L(aki ) = log a:aki=+1 (a)

    a:aki=1 (a)

    , (3.4)

    where the summations are performed over all a consistent with aki = 1. Further-more,

    (a) = P(a)L+Si=1

    122

    exp( 12

    ||ri S

    j=0

    hjxij||2), (3.5)

    where hj, j = 0, 1,...,S, and 2 are assumed known at the receiver and P(a) is

    obtained from La as

    P(a) =Li=1

    Kk=1

    P(aki ), (3.6)

    with

    P(aki = 1) =exp(La(aki ))

    1 + exp(La(aki )). (3.7)

    Since the number of paths involved in the summations of (3.4) is extrememly

    large for realistic values of K and L, a practical algorithm seeks to simplify or

    approximate this calcualtion.

    26

  • 7/30/2019 BCJR Material Text

    37/81

    3.3 SISO equalization

    3.3.1 The BCJR algorithm

    The classical algorithm for efficiently computing (3.4) by exploiting the trellis

    structure of the set of all paths is the BCJR algorithm [2]. By defining the state

    si at time i as the past S input symbol K-tuples ai, si = (ai1,...,aiS), and a

    branch metric (si, ai) as

    (si, ai) = P(ai)1

    22exp( 1

    2||ri

    Sj=0

    hjxij ||2), (3.8)

    the path metric can be factored into

    (a) =L+Si=1

    (si, ai). (3.9)

    For indices outside the range i = 1,...,L, the variables ai are regarded as empty

    sequences with P(ai = ) = 1.

    For every trellis branch bi

    = (si, a

    i, s

    i+1) starting in state s

    i, labeled by input

    bits ai, and ending in state si+1, the BCJR algorithm computes the sum of the

    path metrics (a) over all paths passing through this branch as

    a:bi

    (a) = (si)(si, ai)(si+1). (3.10)

    The computation of the forward state metrics (si) is performed in the forward

    recursion for i = 1, 2,...,L + S 1:

    (si+1) =

    bi=(si,ai,si+1)

    (si)(si, ai), (3.11)

    27

  • 7/30/2019 BCJR Material Text

    38/81

    with the initial state value (s1) = 1. Similarly, the backward recursion computes

    the backward state metrics (si) for i = L + S, L + S 1,..., 2:

    (si) =

    bi=(si,ai,si+1)

    (si, ai)(si+1), (3.12)

    with the terminal state value (sL+S+1) = 1. With all s, s, and s com-

    puted, the summations over paths in (3.4) can be replaced by the summations

    over branches,

    L(aki ) = log

    bi:a

    ki=+1

    (si)(si, ai)(si+1)

    bi:aki=1(si)(si, ai)(si+1)

    . (3.13)

    The completion phase, in which (3.13) is evaluated for every aki , concludes the

    algorithm.

    The complexity of the BCJR equalizer is proportional to the number of trellis

    states, 2KS. The following subsections describe the operation of the RS-BCJR [5]

    and M-BCJR [9] algorithms, which preserve the general structure of the BCJR,

    but instead operate on dynamically built simplified trellises with a number of

    states controlled via a parameter. In the original form of both algorithms, the

    construction of this simplified trellis occurs during the forward recursion and is

    based on the values of the forward state metrics, while the backward recursion

    and the completion phase just reuse the same trellis.

    3.3.2 The RS-BCJR algorithm

    The way we will describe the operation of the RS-BCJR algorithm is slightly

    different from the presentation in [5], but is in fact equivalent.

    28

  • 7/30/2019 BCJR Material Text

    39/81

    Let us consider two states in the trellis,

    si = (ai1, ...,aiS, aiS1,...,aiS), (3.14)

    si = (ai1, ...,aiS, aiS1,...,a

    iS), (3.15)

    differing only in the last SS binary K-tuples. Furthermore, consider two partialpaths beginning in states si and s

    i and corresponding to the same partial input

    sequence a[i,L] = (ai,...,aL). Both paths are guaranteed to merge after S S

    time indices, and hence their partial path metrics are

    (si, a[i,L]) =i+SS1

    j=i

    (sj,aj)L

    j=i+SS

    (sj, aj), (3.16)

    (si, a[i,L]) =i+SS1

    j=i

    (sj , aj)L

    j=i+SS

    (sj, aj). (3.17)

    Additionally, close examination of (3.8) reveals that the difference between (sj, aj)

    and (sj , aj) for j = i,...,i + SS 1 is not large. Hence, the difference between

    (si, a) and (si, a), for a[i,L], is also not large.

    The RS-BCJR equalizer relies on the above observation and, for some prede-

    fined S, declares states differing only in the last S S binary K-tuples indis-tinguishable. Every such set of states is subsequently reduced to a single state,

    by selecting the state with the highest forward metric and merging all remaining

    states into it. Here, we define merging of the state si into si as updating the

    forward metric (si) := (si) + (si), redirecting all trellis branches ending at si

    into si, and deleting si from the trellis. This reduction is performed during the

    forward recursion, and the s for the paths that originate from removed states

    need never be computed. The trellis that results has only 2KS

    states, compared

    29

  • 7/30/2019 BCJR Material Text

    40/81

    to 2KS in the original trellis. The same trellis is then reused in the backward

    recursion and the completion stage.

    The RS-BCJR equalizer is particularly effective when the final coefficients of

    the ISI channel are small in magnitude. Furthermore, the reduced-state trellis

    retains the same branch-to-state ratio (branch density) and has the same number

    of branches with aki = +1 and aki = 1 for any i and k properties that ensure

    a high quality for the soft outputs and good convergence of iterative decoding.

    Unfortunately, the RS-BCJR algorithm cannot use the signal power in the final

    SS channel taps, effectively reducing the minimum Euclidean distance between

    paths. Moreover, the number of surviving states can only be set to a power of 2K,

    which could be a problem for large K (e.g., for a system with 16QAM modulation,

    equalization using 16 states could result in poor performance, while 256 states

    could exceed acceptable complexity).

    3.3.3 The M-BCJR algorithm

    The M-BCJR algorithm is based on the M-algorithm [1], originally designed

    for the problem of maximum likelihood sequence estimation. The M-algorithm

    keeps track only of the M most likely paths at the same depth, throwing away

    any excess paths. In the M-BCJR equalizer this idea is applied to the trellis

    states during the forward recursion. At every level i, when all (si) have been

    computed, the M states with the largest forward metrics are retained, and all

    remaining states are deleted from the trellis (together with all the branches that

    lead to or depart from them). The same trellis is then reused in the backward

    recursion and completion phase.

    In [9] it was shown that the M-BCJR algorithm performs well when the state

    30

  • 7/30/2019 BCJR Material Text

    41/81

    reduction ratio 2KS/M is not very large. Also, unlike the RS-BCJR algorithm, it

    can use the power from all the channel taps. For small M, however, the reduced

    trellis is very sparse, i.e., the branch-to-state ratio is much smaller than in the full

    trellis and there is often a disproportion between the number of branches labeled

    with aki = +1 and aki = 1 for any i and k. These factors reduce the quality of the

    soft outputs and the convergence performance and may require an alternative way

    of computing the a posteriori likelihoods (like the Bayesian estimation approach

    presented in [23]). Finally, the M-BCJR algorithm requires performing a partial

    sort (finding the M largest elements out of M2K) at every trellis section, which

    increases the complexity per state.

    3.4 The M-BCJR algorithm

    In this section we demonstrate how the concept of state merging present in

    the RS-BCJR equalizer can be used to enhance the performance of the M-BCJR

    algorithm. We call the resulting algorithm the M-BCJR algorithm.

    During the forward recursion the M

    -BCJR algorithm retains a maximum ofM states for any time index i. Unlike the M-BCJR algorithm, however, the excess

    states are not deleted, but merely merged into some of the surviving states. This

    means that none of the branches seen so far are deleted from the trellis, but they

    are just redirected into a more likely state. The forward recursion of the algorithm

    can be described as follows:

    1. Set i := 1. For the initial trellis state s1, set (s1) := 1. Also, fix the set of

    states surviving at depth 1 to be S1 := s1.

    2. Initialize the set of surviving states at depth i +1 to an empty set, Si+1 = .

    31

  • 7/30/2019 BCJR Material Text

    42/81

    3. For every state si in the set Si, and every branch b = (si, ai, si+1) originating

    from that state, compute the metric (si, ai), and add si+1 to the set Si+1.

    4. For every state si+1

    in Si+1

    compute the forward state metric as a sum of

    (si)(si, ai) over all branches b = (si, ai, si+1) visited in step 3 that end in

    si+1.

    5. If the number of states in Si+1 is no more than M, proceed to step 8.

    Otherwise continue with step 6.

    6. Determine the M states in Si+1 with the largest value of the forward state

    metric. Remove all remaining states from Si+1 and put them in a temporary

    set Si+1.

    7. Go over all states si+1 in the set Si+1 and perform the following tasks for

    each of them:

    - Find a state si+1 in Si+1 that differs from si+1 by the least number of

    final K-tuples aj.

    - Redirect all branches ending in si+1 to si+1.

    - Add (si+1) to the metric (si+1).

    - Delete si+1 from the set Si+1.

    8. Increment i by 1. Ifi L + S 1, go to step 2. Otherwise the forwardrecursion is finished.

    The merging of si into si in step 7 is also illustrated in Figure 3.3. The

    backward recursion and the completion phase are subsequently performed only

    over states remaining in the sets Si and only over visited branches (i.e., branches

    for which the metrics were calculated in step 3).

    32

  • 7/30/2019 BCJR Material Text

    43/81

    Figure 3.3. Trellis section a) before and b) after merging an excess statesi into a surviving state si.

    Just as for the M-BCJR, the M-BCJR algorithm can use the power from all

    channel taps and offers full freedom in choosing the number of surviving states

    M. At the same time, the M-BCJR never deletes visited branches, and hence it

    retains the branch density of the full trellis and avoids a disproportion between

    the number of branches labeled with aki = +1 and aki = 1. As a result, the

    soft outputs generated by the M-BCJR equalizer ensure good convergence of

    the iterative receiver. Complexity-wise, the algorithm requires some additional

    processing per state (due to step 7) and some additional memory per branch (the

    ending state must be remembered for each branch). However, if we regard the

    calculation of the branch metrics as the dominant operation, the complexities

    of the M-BCJR, RS-BCJR, and M-BCJR equalizers are the same for fixed M =

    2KS

    .

    3.5 Simulation results

    To evaluate the performance of the M-BCJR equalizer, we considered two

    turbo-equalization systems. Both systems used a recursive, memory 5, rate 1/2

    terminated convolutional code as an outer code. The first system used BPSK

    33

  • 7/30/2019 BCJR Material Text

    44/81

    TABLE 3.1

    SIMULATED TURBO-EQUALIZATION SCENARIOS

    Scenario 1 Scenario 2

    Outer code CC(2,1,5) CC(2,1,5)

    Modulation BPSK 16QAM

    Channel memory S 4 2

    CIR {0.45, 0.25, {1, 1, 1}{h0,...,hS}

    0.15,

    0.1,

    0.05}

    BCJR states 16 256

    Interleaver size 1024 4096

    No. of iterations 6 6

    modulation and a 5-tap channel (maximum 16 states), and a block of 507 infor-

    mation bits (size 1024 DRP [6] interleaver). The second system used 16QAM

    modulation, but only a 3-tap channel (maximum 256 states), and a block of 2043

    information bits (size 4096 DRP interleaver). The remaining parameters and the

    channel impulse responses are summarized in Table 3.1.

    Both systems were simulated with the M-BCJR and RS-BCJR equalizers,

    for several values of M and S. In each case we allowed the receiver to perform

    6 iterations. The bit error rates Pe for a range of Eb/No (average energy per

    bit over noise spectral density) are plotted in Figure 3.4. To better illustrate the

    complexity-performance tradeoffs achievable with both algorithms, we also plotted

    the number of states M or 2KS

    against the Eb/No needed to achieve certain Pe

    (104 for system 1 and 103 for system 2) in Figure 3.5.

    The simulations demonstrate the superior performance of the M-BCJR equal-

    izer. In scenario 1, the M-BCJR equalizer with 3 states outperforms the RS-

    34

  • 7/30/2019 BCJR Material Text

    45/81

    a)

    ! "

    #$

    % & ' &

    #(

    ) 0

    0

    '

    1

    #(

    ) 0

    2

    3

    1

    #(

    ) 0

    0

    '

    1

    #(

    ) 0

    4

    5 6

    7

    8

    7

    9

    6

    @ 6

    7

    8

    7

    9

    6

    A 6

    7

    8

    7

    9

    6

    2

    3

    1

    # ( ) 0

    4

    5 6

    7

    8

    7

    9

    6

    B 6

    7

    8

    7

    9

    6

    @ 6

    7

    8

    7

    9

    6

    C 6

    7

    8

    7

    9

    6

    b)

    D E F G H I P Q

    P P PR

    PQ

    S

    T

    PQ

    S

    U

    PQ

    S

    V

    P Q

    S W

    PQ

    X

    Y

    a

    b

    c d

    e

    f g

    h

    i

    hp

    q r

    rs

    t

    hp

    q r

    d u

    v w

    x

    y

    x

    w i

    t

    hp

    q r

    d

    v

    u

    u

    v

    w

    x

    y

    x

    wi

    Figure 3.4. Bit error rate of M-BCJR and RS-BCJR for a) scenario 1(BPSK) and b) scenario 2 (16QAM).

    35

  • 7/30/2019 BCJR Material Text

    46/81

    a)

    ! " #

    $ %

    & ' (

    )

    0

    1

    23

    4

    5

    6 2

    7

    8

    4

    5 6 2

    b)

    9 @ A B C D C C C E

    C

    E

    F

    G

    H

    9

    @

    A

    I

    P

    Q

    R

    S T

    U

    V W

    X

    Y

    a b c d e

    f g

    h i p

    q

    r

    s

    tu

    v

    Xw

    x t

    y

    v

    Xw

    x t

    Figure 3.5. Number of states vs. Eb/No to reach the reference Pe for a)scenario 1 (BPSK, Pe = 10

    4) and b) scenario 2 (16QAM, Pe = 103).

    36

  • 7/30/2019 BCJR Material Text

    47/81

    BCJR with 8 states by 0.1 dB for Pe below 104. When both algorithms use 4

    states, the M-BCJR equalizer offers a 0.7 dB gain compared to the RS-BCJR.

    In scenario 2, the M-BCJR with 16 states achieves almost a 3 dB gain over the

    RS-BCJR with the same number of states.

    3.6 Summary

    We have examined the problem of complexity reduction in turbo equalization

    for systems with large constellation sizes and/or long channel impulse responses.

    We have defined the operation of merging one state into another and used it to give

    an alternative interpretation of the RS-BCJR algorithm. Finally we modified the

    M-BCJR algorithm, replacing the deletion of excess states by the merging of these

    states into the surviving states. The resulting algorithm, called the M-BCJR

    algorithm, was shown to generate reduced-complexity trellises more suitable for

    SISO equalization than those obtained by the RS-BCJR and M-BCJR algorithms.

    Simulation results demonstrated very good performance for turbo-equalization

    systems employing the M

    -BCJR, exceeding that of the RS-BCJR even with muchsmaller complexities.

    37

  • 7/30/2019 BCJR Material Text

    48/81

    CHAPTER 4

    SERIAL CONCATENATIONS WITH SIMPLE BLOCK INNER CODES

    4.1 Introduction

    Serially concatencated codes (SCCs) [3] are one of the error control techniques

    that offers good error protection and efficient decoding using iterative turbo

    decoders [4]. A rate RORI SCC encoder first collects a block of information bits

    and encodes it using a rate RO outer code. The resulting intermediate coded

    bit sequence is permuted using an interleaver and subsequently encoded using a

    rate RI inner code. At the receiver, decoding is implemented using soft-input

    soft-output (SISO) decoders for each of the component codes, where the extrinsic

    information about the intermediate sequence is iteratively exchanged between the

    two decoders.

    When an SCC is used to communicate over a binary-input additive white

    Gaussian noise (AWGN) channel, its performance is typically characterized by

    the average bit error rate (BER) as a function of the ratio Eb/N0 of the energy

    per information bit to the one-sided noise power spectral density. When plotted,

    the BER curve shows three distinct regions. The region of very low Eb/N0 is char-acterized by high error rates resulting from the inability of the iterative decoder to

    converge. In the region of high Eb/N0, called the error floor region, the iterative

    decoder almost always converges to the minimum-distance codeword, performing

    38

  • 7/30/2019 BCJR Material Text

    49/81

    nearly maximum likelihood (ML) decoding after just a few iteratons. Finally, in

    the middle Eb/N0 region, called the waterfall region and characterized by a rapid

    drop in the BER, iterative decoding converges only for some received sequences,

    and a large number of iterations may be required to approach the ML solution.

    An SCC with good performance is characterized by the waterfall region located

    at a low Eb/N0 and the error floor region located at a low BER. The standard de-

    sign tools that provide good predictions about the performance of an SCC in the

    error floor and waterfall regions are uniform interleaver analysis [3] and extrinsic

    information transfer (EXIT) charts [35], respectively.

    In this paper we consider an SCC with an inner block code, as illustrated in Fig.

    4.1. This is in contrast to the usual practice of using recursive convolutional codes

    as inner codes, since inner block codes provide no asymptotic interleaver gain [3],

    i.e., the error floor does not decrease indefinitely with increasing interleaver size.

    However, for moderate and fixed interleaver sizes this is not a serious drawback.

    Suppose that the outer code has a minimum output weight dOmin and the inner

    block code has a minimum output weight dI1 corresponding to an input sequence

    with weight one. Then, as long as the interleaver is able to spread the low weight

    outer codewords in such way that every nonzero bit is placed in a separate inner

    block, the minimum distance of the SCC can be as high as dOmindI1. Based on this

    straightforward observation, we can summarize our design criteria for a good SCC

    as follows:

    choose the outer code with a large dOmin,

    choose the inner block code with a large dI1, and

    the outer and inner codes should have well-matched EXIT characteristics.

    39

  • 7/30/2019 BCJR Material Text

    50/81

    (n,k,m) CC

    Encoder

    (outer)

    GSPC

    Encoder

    (inner)

    Binary input

    AWGN

    InnerSISO

    OuterSISO

    Figure 4.1. Serially concatenated coding with an inner block code.

    +

    +

    Figure 4.2. Generalized single parity check encoder.

    Perhaps the simplest block encoder that provides a large dI1 can be obtained by

    modifying a single parity check (SPC) code as illustrated in Fig. 4.2. The encoderfor this (K, L) generalized single parity check (GSPC) code computes a parity bit

    for the K information bits and then adds it modulo 2 to the first L information

    bits. Clearly, an input sequence with weight 1 produces an output sequence with

    weight L + 2 or L, depending on the bit location. Despite its simple structure,

    we show in the following sections that an SCC utilizing such an inner code can

    perform very well in both the waterfall and error floor regions of the BER curve.

    The rest of the paper is organized as follows. Section 2 presents a SISO decoder

    for the GSPC code. Section 3 derives bounds, based on the uniform interleaver

    analysis, on the ML-decoding performance of the SCC. Section 4 examines the

    40

  • 7/30/2019 BCJR Material Text

    51/81

    relation between the parameters K and L of the GSPC code and the shape of its

    EXIT curve. Section 5 presents simulation results for designed SCCs utilizing a

    GSPC code. Finally, some conclusions are drawn in Section 6.

    4.2 Soft-output decoding of the GSPC code

    A SISO decoder for an SPC code accepts a priori L-values La(un) for each

    information bit un, n = 1,...,K, channel L-values L(xn) for each coded bit xn,

    n = 1,...,K+ 1, and produces extrinsic L-values Le(un). These L-values are

    respectively defined as

    La(un) = logPr{un = 0}Pr{un = 1} ,

    L(xn) = logPr{xn = 0}Pr{xn = 1} ,

    Le(un) = logPr{un = 0|x1, . . ,xK+1}Pr{un = 1|x1, . . ,xK+1} La(un).

    Soft-output decoding of SPC codes has been thoroughly studied in the litera-

    ture in the context of product codes [13], low density parity check (LDPC) codes

    [37], repeat-accumulate codes [36], and others. Despite the similaties between

    SPC and GSPC codes, such as having identical codebooks for even L (but dif-

    ferent input-output mappings), the techniques commonly used for decoding SPC

    codes (e.g., the operation in [13]) cannot be easily generalized to GSPC codes.

    Instead we propose to perform SISO decoding using the special 4-state trellis

    illustrated in Fig. 4.3. The trellis has K sections, with the first L being of type I

    and the remaining KL of type II. The trellis state (ln, gn) at time n = 0, 1,...,Kconsists of two bits, the local parity bit generated by all input bits preceding a

    41

  • 7/30/2019 BCJR Material Text

    52/81

    given trellis section (ln =n

    m=1 um) and the global parity bit generated by all

    input bits (gn =K

    m=1 um). Consistency requires that only (l0, g0) = (0, 0) and

    (l0, g0) = (0, 1) are valid starting states and only (lK, gK) = (0, 0) and (lK, gK) =

    (1, 1) are valid ending states. A trellis branch exists between states (ln1, gn1)

    and (ln, gn) if gn1 = gn. Each branch is labeled with an (un, xn) pair, which

    depends on the starting state, ending state, and the trellis section type. For type

    I sections un = ln1 + ln and xn = ln1 + ln + gn, while for type II sections

    un = xn = ln1 + ln.

    The extrinsic L-values for the input bits can be obtained by performing a

    BCJR-like processing [2] on the trellis defined above. First, we compute the

    branch metrics n(ln1, ln, gn) for every connected pair of states (ln1, gn1 = gn)

    and (ln, gn) as

    n(ln1, ln, gn) =(1)un

    2La(un) +

    (1)xn2

    L(xn).

    Then we recursively obtain the forward state metrics n(ln, gn) and the backward

    state metrics n(ln, gn) as

    n(ln, gn) = max(n1(0, gn) + n(0, ln, gn),

    n1(1, gn) + n(1, ln, gn)),

    n(ln, gn) = max(n+1(0, gn) + n+1(ln, 0, gn+1),

    n+1(1, gn) + n+1(ln, 1, gn+1)).

    42

  • 7/30/2019 BCJR Material Text

    53/81

    The initial values for the state metrics are

    0(0, 0) = 0(0, 1) = 0,

    0(1, 0) = 0(1, 1) = ,

    K(0, 0) =1

    2L(xK+1),

    K(1, 1) = 12

    L(xK+1),

    K(0, 1) = K(1, 0) = ,

    where the infinite values imply invalid states and the values assigned to the back-

    ward state metrics account for the parity bit xK+1. The final extrinsic L-values

    Le(un) are then computed as

    Le(un) =

    maxun=0

    (n1(ln1, gn) + n(ln1, ln, gn) + n(ln, gn))

    maxun=1

    (n1(ln1, gn) + n(ln1, ln, gn) + n(ln, gn)),

    where the max operations are performed over all branches (ln1, ln, gn) with either

    un = 0 or un = 1.

    4.3 Bounds on ML performance of SCCs with an inner GSPC code

    It is possible to obtain accurate bounds on the BER performance of linear codes

    under ML decoding that depend on the code structure entirely via the multiplici-

    ties of codewords of given input and output weights. Additional techniques, such

    as a uniform interleaver analysis, offer approximations to the codeword weight

    spectrum of an SCC based entirely on the weight spectra of the inner and outer

    43

  • 7/30/2019 BCJR Material Text

    54/81

    ! "

    #

    #

    !

    $

    $

    $

    %

    ! "

    #

    &

    !

    $

    $ $

    '

    '

    '

    '

    (

    (

    )

    )

    ! "

    #

    #

    !

    $

    '

    '

    '

    '

    (

    (

    )

    )

    ! "

    #

    #

    !

    $

    $ $

    0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 4 0 1 2 3 4 4

    5 6

    7

    6 5

    7

    5

    8 9

    7

    8 9

    5

    8

    7

    8

    5

    8 @

    7

    8 @

    5

    '

    9

    7

    '

    9

    5

    '

    7

    '

    Figure 4.3: Trellis for soft-output decoding of the generalized parity check code.

    code [3]. In this section we will apply these techniques to SCCs with an inner

    GSPC code.

    Let AO(W, H) denote the input-output weight enumerating function (IOWEF)

    of the outer code, defined as

    AO(W, H) =w=0

    h=0

    AOw,hWwHh,

    where AOw,h denotes the number of codewords with input weight w and output

    weight h. By analogy, let AI(H, X) and AC(W, X) denote the IOWEFs of the

    inner code and the SCC, respectively. If the IOWEF of the SCC is known, the

    BER under ML decoding is upper-bounded by

    Pb(Eb/N0) x=0

    w=1

    wACw,x

    Q(xRoEb/N0).

    The exact computation of the AC

    (W, X), however, is infeasible for most practical

    codes. However, approximations to AC(W, X) can still yield meaningful bounds,

    as long as they accurately predict the multiplicities of the low weight codewords.

    Below we obtain two such approximations using different assumptions about the

    44

  • 7/30/2019 BCJR Material Text

    55/81

    interleaver: an idealized interleaver and a uniform interleaver. Both approaches

    share the convenient property of depending on the structure of the outer code

    only via its weight enumerating function.

    4.3.1 An idealized interleaver

    Consider an interleaver that is capable of spreading every codeword of the outer

    code in such a way that every bit equal to one is mapped to a separate block of

    the inner code. Of course, except for pathological cases, such an interleaver would

    be impossible to construct. However, it is feasible to construct one that spreads

    out all the low-weight codewords that determine the performance of the SCC for

    medium and large Eb/N0. Hence, this idealized interleaver can provide a useful

    asymptotic approximation to performance for very large interleaver sizes.

    When a (K, L) GSPC code is used to encode an input sequence with a single

    un = 1 and all remaining uk = 0, k = n, the resulting codeword has Hammingweight L if n L and L + 2 if n > L. Assuming that on average a fraction L/Kof the input K-tuples will have n

    L, we can say that an outer codeword with

    weight h will result in a concatenated codeword with weight h(L + 2(K L)/K).Hence

    Pb(EbN0

    ) h=0

    w=1

    wAOw,h

    Q(h(L + 2(K L)/K)Ro Eb

    N0).

    An example where a practical interleaver might provide performance close to the

    idealized interleaver is an SCC with a convolutional outer code and an S-random or

    linear interleaver, since the low-weight codewords are usually generated by single

    error events.

    45

  • 7/30/2019 BCJR Material Text

    56/81

    4.3.2 A uniform interleaver

    The uniform interleaver is a probabilistic device that maps any input sequence

    to each of its possible permutations with equal probability. For an interleaver size

    of NK (see Fig. 4.1), a codeword of weight h generated by the outer encoder

    can be permuted intoNKh

    distinct and equiprobable bit sequences. Then the

    IOWEF coefficients of the SCC can be computed as

    ACw,x =h=0

    NK

    h

    1AOw,hA

    I(h, x).

    In our case the inner code, of input length NK, consists ofN independent blocks

    of length K. The IOWEF of the N block codes can be obtained from the IOWEF

    AP(H, X) for a single block as

    AI(H, X) =

    AP(H, X)N

    .

    AP(H, X) for a single (K, L) GSPC code can be obtained from the conditional

    weight enumerating functions AP(h, X), which we define separately for odd and

    even h. For an input sequence with even weight h we clearly have

    APeven(h, X) =

    K

    h

    Xh.

    The situation is more complicated for odd h, since the output weight depends on

    how many of the 1 bits are placed in the first L positions. In this case we obtain

    APodd(h, X) =

    min(h,L)i=

    max(0,L+hK)

    L

    i

    K Lh i

    XL+h+12i.

    46

  • 7/30/2019 BCJR Material Text

    57/81

    The IOWEF AP(H, X) then equals

    AP(H, X) =

    h=0 H2hAPeven(2h, X)

    +h=0

    H2h+1APodd(2h + 1, X).

    The resulting AP(H, X) can then be used in the previous equations to yield the

    desired bound.

    4.3.3 Comparison with simulation results

    In Fig. 4.4 we present a comparison between the actual BER curves and

    the bounds obtained using idealized and uniform interleavers, and we note good

    agreement between the theoretical and experimental results. The SCC consists

    of a rate 1/2, 2-state convolutional outer code (CC) with generator polynomial

    G(D) = [1, 1 + D], a 4096-bit dithered relative prime (DRP) interleaver [6], and

    an inner GSPC code with K = 8 and L = 0, 2, 4, 6, 8.

    4.4 EXIT chart analysis for GSPC codes

    As a complement to the error floor analysis of the previous section, the EXIT

    charts proposed by ten Brink [35] can be used to predict the convergence threshold

    of iterative decoding, and hence the performance in the waterfall region. EXIT

    charts characterize the average mutual information between the information or

    coded bits and their extrinsic L-values at the output of a SISO decoder as a func-

    tion of the mutual information carried by the a priori L-values. When these mutual

    information transfer curves for the inner and outer codes are plotted against each

    other for a certain value ofEb/N0, the lowest value of Eb/N0 for which the curves

    47

  • 7/30/2019 BCJR Material Text

    58/81

    0 1 2 3 4 5 610

    10

    108

    106

    104

    102

    100

    Eb/N

    o(in dB)

    Biterrorra

    te

    (K,L)=(8,0)(K,L)=(8,2)(K,L)=(8,4)(K,L)=(8,6)(K,L)=(8,8)idealized interleaveruniform interleaver

    Figure 4.4. Comparison of simulation results and BER bounds for the

    2-state, rate 1/2 outer convolutional code.

    do not cross corresponds to the convergence threshold of the iterative decoder.

    The EXIT curves for GSPC code for some selected values of K and L are plot-

    ted in Fig. 4.5. The relation between the curve shapes and the code parameters

    can be summarized in the following observations:

    The value ofIE(IA = 1) depends directly on the parameter L (the higher L,the higher IE(IA = 1)).

    The value ofIE(IA = 0) also depends on L, (the higher L, the lower IE(IA =0)).

    The steepness of the IE(IA) curve in the neighborhood of IA = 1 dependson K (the higher K, the steeper the curve).

    48

  • 7/30/2019 BCJR Material Text

    59/81

    0 0.2 0.4 0.6 0.8 10

    0.2

    0.4

    0.6

    0.8

    1

    IA(u)

    IE(u)

    (K,L)=(8,0)(K,L)=(8,4)(K,L)=(8,8)

    0 0.2 0.4 0.6 0.8 10

    0.2

    0.4

    0.6

    0.8

    1

    IA(u)

    IE

    u

    (K,L)=(8,8)(K,L)=(16,16)(K,L)=(32,32)

    Figure 4.5. Dependence between the GSPC code parameters and theEXIT curve shape.

    4.5 Design examples

    In this section we present two SCCs with an inner GSPC code that were

    designed using the following steps. First, an outer code with a good tradeoff

    between minimum distance and SISO decoding complexity was selected. The

    second step involved choosing the parameter K of the inner GSPC code. Smaller

    K allows for larger N for the same interleaver size, which in turn improves the

    process of spreading low-weight outer codewords into separate inner code blocks,

    making the system perform closer to the idealized interleaver bound. On the other

    hand, too small a value of K leads to a reduced overall code rate and limits the

    range of the parameter L that can be selected. We found that the useful values

    for K are in range of 8 to 24. Finally, the value of L was chosen so that the outer

    and inner EXIT charts match as closely as possible.

    The parameters of the two SCCs are as follows. We selected the outer codes

    to be rate 1/2 convolutional codes GSCC1 = [27o, 31o] and GSCC2 = [561o, 753o],

    which are the 16-state and 256-state codes with the largest possible free distance

    [19]. The inner codes that provided the best match turned out be (K, L) = (12, 8)

    49

  • 7/30/2019 BCJR Material Text

    60/81

    and (K, L) = (16, 8) GSPC codes, respectively. Furthermore, we used a 4032-bit

    dithered relative prime (DRP) interleaver [6]. The EXIT charts illustrating the

    degree of matching achieved are plotted in Fig. 4.6, while the corresponding BER

    curves and the uniform interleaver bounds are plotted in Fig. 4.7. As can be

    observed, both schemes perform about 1.5 dB away from the Shannon limit at

    a BER 106 with 16 iterations and about 1.3 dB away with 32 iterations, and

    according to the uniform iterleaver bounds, both codes show potentially very low

    error floors at a BERs of 1010 (SCC1) and 1015 (SCC2). In addition, the EXIT

    charts in Fig. 4.6 indicate that waterfall region performance within 0.3 dB of the

    Shannon limit can be obtained with larger interleaver sizes.

    4.6 Conclusions

    In this paper we showed that SCCs with simple inner block codes can achieve

    very good performance in both the waterfall and error floor regions of the BER

    curve. We proposed a simple block inner code providing a large value of d1 and

    a SISO algorithm needed for iterative decoding. We examined their propertiesusing both a uniform interleaver analysis and EXIT charts, and two example

    SCCs exhibiting very promising performance were designed.

    50

  • 7/30/2019 BCJR Material Text

    61/81

    a)0 0.2 0.4 0.6 0.8 1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    IA(u)

    IE(u)

    Outer 16state R=1/2 CCInner GSPC (K,L)=(12,8)

    Eb/N

    0= 0.5 dB

    b)0 0.2 0.4 0.6 0.8 1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    IA(u)

    IE(u)

    Outer 256state R=1/2 CCInner GSPC (K,L)=(16,8)

    Eb/N

    0= 0.5 dB

    Figure 4.6. EXIT charts for a) SCC 1 with 16-state outer code, b) SCC2 with 256-state outer code.

    51

  • 7/30/2019 BCJR Material Text

    62/81

    a)

    0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 510

    12

    1010

    108

    106

    104

    102

    100

    Eb/N

    o(in dB)

    Biterrorrate

    16state CC + GPC (12,8,1)Uniform interleaver bound

    Shannon limitfor BPSKat R=6/13E

    b/N

    0 0.0 dB

    1 iter.

    2 iter.

    4 iter.8 iter.

    16 iter.32 iter.

    b)

    0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 510

    16

    1014

    1012

    1010