Download pdf - A Parallelized Layered QC-LDPC Decoder for IEEE 802alexiosbalatsoukas.com/papers/13NEWCASLDPCPoster.pdf · A Parallelized Layered QC-LDPC Decoder for IEEE 802.11ad IEEE 802.11ad:

A. Balatsoukas-Stimming*, N. Preyss*, A. Cevrero*, A. Burg*, C. Roth†

*Department Of Electrical Engineering, EPFL, Lausanne, Switzerland, †Integrated Systems Laboratory, ETHZ, Zurich, SwitzerlandE-mail: {alexios.balatsoukas, nicholas.preyss, alessandro.cevrero, andreas.burg}@epfl.ch, [email protected]

A Parallelized Layered QC-LDPC Decoder for IEEE 802.11ad

IEEE 802.11ad: Multi-gigabit throughput for wireless LAN

IEEE 802.11ad: Multi-gigabit throughput for wireless LAN

>10x times higher throughput offers new wireless opportunities:• Raw HD streaming • Instant media library sync• Ultra-high throughput IP links

IEEE 802.11ad requires high-speed baseband signal processing at low power consumption

Challenges:• Complex channel conditions due to high delay spread• Large device variations of analog front-ends• Gbit/s bit rate (1.54Gbps mandatory, 3.08 & 6.16Gbps optional)

Layered Decoding ScheduleLayered Decoding Schedule• Performance highly affected by message-passing schedule• Flooding Schedule: all variable-to-check messages updated, then all check-to-

variable messages updated. Highly parallelizable, slow convergence• Layered Schedule: variable-to-check and check-to-variable messages for 1st check

node, then 2nd, etc. Fast convergence, low parallelism

• Twofold reduction in number of iterations ≈ twofold reduction in energy consumption

• But: very challenging to achieve multi-gigabit throughput

Parallelized Decoder ArchitectureParallelized Decoder Architecture

802.11ad Channel Coding: QC-LDPC Codes802.11ad Channel Coding: QC-LDPC Codes

Application to IEEE 802.11ad requires:• Very high throughput• Low powerSolution:

Highly conflicting requirements!

Control Sequence OptimizationControl Sequence Optimization• Z=42 and N=16 are fixed by IEEE802.11ad• I=5 (number of iterations) is fixed to satisfy QoS requirements• L (sequence length) can be optimized

Optimization Method

Detailed view of COMB unit

• Layered decoding & early termination → low power• Additional parallelization → high throughput

• Re-arrange rows and columns of parity-check matrix to minimize pipeline stalls → higher throughput

• (Almost) free lunch: only LLR access order changes• Parallelization overhead: ̴10%• Average length reduction: ̴13%• Reduction in max. length: ̴13%• Result: 3.12 Gbps min. throughput

ConclusionConclusionLow-power layered LDPC decoder is feasible when multi-

gigabit throughput is required

Careful assignment of processing units to parity-check matrix blocks leads to very efficient parallelization

[1] Draft Standard for Information Technology, Draft Amendment 5, IEEEP802.11ad/D5.0, IEEE Std., Sep. 2011.[2] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MIT Press,1963.[3] M. Weiner, B. Nikolic, and Z. Zhang, “LDPC decoder architecture for high-data rate personal-area networks,” in Proc. IEEE Int. Symp. Circuits andSystems, 2011.[3] M. P. C. Fossorier, “Quasi-cyclic low-density parity-check codes fromcirculant permutation matrices,” IEEE Trans. Inf. Theory, vol. 50, no. 8, 2004.

[5] C. Studer, N. Preyss, C. Roth, and A. Burg, “Configurable high throughputdecoder architecture for quasi-cyclic LDPC codes,” in Proc. 42nd AsilomarConf. on Signals, Systems and Computers, 2008.[6] E. Sharon, S. Litsyn, and J. Goldberger, “Efficient serial message-passingschedules for LDPC decoding,” IEEE Trans. Inf. Theory, vol. 53, no. 11, Nov.2007.[7] H. Shirani-Mehr, T. Mohsenin, and B. Baas, “A reduced routing networkarchitecture for partial parallel LDPC decoders,” in Proc. 45th Asilomar Conf.on Signals, Systems and Computers, 2011.

References:

• Doubly parallelized architecture: 1. Two blocks of every row of H processed simultaneously2. COMB unit combines partial results to ensure proper operation

• Processing units and shifters doubled• No additional memory required• Simple routing preserved

• Throughput:

Synthesis Results

• Parity-check matrix1. Consists of 42x42 cyclic

permutation matrices2. Illustrates parity constraints

imposed on bits by the code Parity-check matrix of rate ½ code

3. Is used to decode codewords via Min-Sum (MS) message-passing4. Represents graph in which columns are variable nodes, rows are check nodes5. Various coding rates are used depending on channel conditions

Reference Architecture

• MIN and SEL units perform basic functions of MS decoding on Z independent rows of H simultaneously• Parity-check matrix blocks are processed serially in a pipeline• Memory reads/writes dictated by control sequence, data dependencies avoided by pipeline stalling