A. Balatsoukas-Stimming*, N. Preyss*, A. Cevrero*, A. Burg*, C. Roth†
*Department Of Electrical Engineering, EPFL, Lausanne, Switzerland, †Integrated Systems Laboratory, ETHZ, Zurich, SwitzerlandE-mail: {alexios.balatsoukas, nicholas.preyss, alessandro.cevrero, andreas.burg}@epfl.ch, [email protected]
A Parallelized Layered QC-LDPC Decoder for IEEE 802.11ad
IEEE 802.11ad: Multi-gigabit throughput for wireless LAN
IEEE 802.11ad: Multi-gigabit throughput for wireless LAN
>10x times higher throughput offers new wireless opportunities:• Raw HD streaming • Instant media library sync• Ultra-high throughput IP links
IEEE 802.11ad requires high-speed baseband signal processing at low power consumption
Challenges:• Complex channel conditions due to high delay spread• Large device variations of analog front-ends• Gbit/s bit rate (1.54Gbps mandatory, 3.08 & 6.16Gbps optional)
Layered Decoding ScheduleLayered Decoding Schedule• Performance highly affected by message-passing schedule• Flooding Schedule: all variable-to-check messages updated, then all check-to-
variable messages updated. Highly parallelizable, slow convergence• Layered Schedule: variable-to-check and check-to-variable messages for 1st check
node, then 2nd, etc. Fast convergence, low parallelism
• Twofold reduction in number of iterations ≈ twofold reduction in energy consumption
• But: very challenging to achieve multi-gigabit throughput
Parallelized Decoder ArchitectureParallelized Decoder Architecture
802.11ad Channel Coding: QC-LDPC Codes802.11ad Channel Coding: QC-LDPC Codes
Application to IEEE 802.11ad requires:• Very high throughput• Low powerSolution:
Highly conflicting requirements!
Control Sequence OptimizationControl Sequence Optimization• Z=42 and N=16 are fixed by IEEE802.11ad• I=5 (number of iterations) is fixed to satisfy QoS requirements• L (sequence length) can be optimized
Optimization Method
Detailed view of COMB unit
• Layered decoding & early termination → low power• Additional parallelization → high throughput
• Re-arrange rows and columns of parity-check matrix to minimize pipeline stalls → higher throughput
• (Almost) free lunch: only LLR access order changes• Parallelization overhead: ̴10%• Average length reduction: ̴13%• Reduction in max. length: ̴13%• Result: 3.12 Gbps min. throughput
ConclusionConclusionLow-power layered LDPC decoder is feasible when multi-
gigabit throughput is required
Careful assignment of processing units to parity-check matrix blocks leads to very efficient parallelization
[1] Draft Standard for Information Technology, Draft Amendment 5, IEEEP802.11ad/D5.0, IEEE Std., Sep. 2011.[2] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MIT Press,1963.[3] M. Weiner, B. Nikolic, and Z. Zhang, “LDPC decoder architecture for high-data rate personal-area networks,” in Proc. IEEE Int. Symp. Circuits andSystems, 2011.[3] M. P. C. Fossorier, “Quasi-cyclic low-density parity-check codes fromcirculant permutation matrices,” IEEE Trans. Inf. Theory, vol. 50, no. 8, 2004.
[5] C. Studer, N. Preyss, C. Roth, and A. Burg, “Configurable high throughputdecoder architecture for quasi-cyclic LDPC codes,” in Proc. 42nd AsilomarConf. on Signals, Systems and Computers, 2008.[6] E. Sharon, S. Litsyn, and J. Goldberger, “Efficient serial message-passingschedules for LDPC decoding,” IEEE Trans. Inf. Theory, vol. 53, no. 11, Nov.2007.[7] H. Shirani-Mehr, T. Mohsenin, and B. Baas, “A reduced routing networkarchitecture for partial parallel LDPC decoders,” in Proc. 45th Asilomar Conf.on Signals, Systems and Computers, 2011.
References:
• Doubly parallelized architecture: 1. Two blocks of every row of H processed simultaneously2. COMB unit combines partial results to ensure proper operation
• Processing units and shifters doubled• No additional memory required• Simple routing preserved
• Throughput:
Synthesis Results
• Parity-check matrix1. Consists of 42x42 cyclic
permutation matrices2. Illustrates parity constraints
imposed on bits by the code Parity-check matrix of rate ½ code
3. Is used to decode codewords via Min-Sum (MS) message-passing4. Represents graph in which columns are variable nodes, rows are check nodes5. Various coding rates are used depending on channel conditions
Reference Architecture
• MIN and SEL units perform basic functions of MS decoding on Z independent rows of H simultaneously• Parity-check matrix blocks are processed serially in a pipeline• Memory reads/writes dictated by control sequence, data dependencies avoided by pipeline stalling