Upload
joshua-king
View
218
Download
0
Embed Size (px)
Citation preview
Tinoosh Mohsenin and Bevan M. Baas
VLSI Computation Lab, ECE DepartmentUniversity of California, Davis
Split-Row: A Reduced Complexity, High Throughput Low Density Parity Check (LDPC) Decoder Architecture
Outline
Introduction to LDPC Codes Split-Row Decoder Algorithm Error Performance Comparison Decoder Implementation Results Conclusion
Error Correction in Communication Systems
Error correction is widely used in most communication systems.
Encoder(Redundancy
Added)
Decoder(Error Detectionand Correction
Noise
Binaryinformation
Correctedinformation
Encodedinformation
Noisyinformation
LDPC Codes Applications
Standards: 10 Gigabit Ethernet (10GBASE-T): 2006 Digital Video Broadcasting (DVB-S2):2005 Next generation of WiFi and WiMAX
Problems with current LDPC decoders
Lack of enough memory bandwidth High interconnect complexity
[www.ieee802.org/3/an/ ]
Transmitter:
Receiver:
Received Image
Iteration 1 Iteration 14
Noisy Channel
Decoded Image
LDPC Coding
Modified images from [Maccay 2001]
Encoded Image
Parity bits
Performs row and column operations iteratively.
100001010
010100001
001010100
001100010
100010001
010001100
Row Processing
Co
lum
n
Pro
cess
ing
LDPC Decoding: Message Passing Algorithm
Row processing
Column processing
α
βRowprocessing
Colprocessing
Errorcorrection
Parity check
Rowprocessing
Colprocessing
Errorcorrection
Parity check
Received information from channel
βα
Serial Decoders
One or a few row and column processing units.
Features Simple Small area Small number of memories
Disadvantages Low memory bandwidth Low throughput : 100 Kbps-
10Mbps
Mem
Row Col
Full Parallel Decoders
Row and column processors are directly mapped according to the parity check matrix
High throughput Disadvantages
Large circuit area High interconnect
complexity
5x384x32=61440
5x2048x6=61440
Row1
Row2
Row384
Col1
Col2
Col3
Col2048
Example: 2048-bit, 10GBASE-T Row weight=32, Col weight=6, quantization bit=5 139 mm2 in 0.18 µm CMOS 122,000 long inter-processor wires 1.3 Gbps
Outline
Introduction to LDPC Codes Split-Row Decoder Algorithm Error Rate Comparison Decoder Implementation Results Conclusion
Key Features of Split-Row Decoder
Row processing (dominates decoder complexity) Increased parallelism Reduced number of memory accesses Reduced processor complexity
Results: Smaller decoder area and higher utilization Lower interconnect complexity Higher throughput Simpler hardware implementation
Standard vs. Split-Row Decoder
Split-Row DecoderStandard Decoder
MemB
MemA
RowA
RowB
Sign B
Sign A
ColA
ColB
Mem
Row Col
N columnsrow weight=Wr
N/2 columnsrow weight=Wr/2
N/2 columnsrow weight=Wr/2
Split-Row Algorithm-Mathematical View
By normalizing the α values with a scale factor S<1 the error performance of Split-Row decoder is improved
'
',1',1,''
''
ijjjhjhj
ijSplitij
splitijij
signS
The magnitude part of the row processor output α, is larger for the Split-Row decoder
'
',1'',1,''
'
ijjjhjhj
ij
ijij
sign
'
',1',1,''
''
ijjjhjhj
ijSplitij
splitijij
sign
ijSplitij
Sign Magnitude
Outline
Introduction to LDPC Codes Split-Row Decoder Algorithm Error Performance Comparison Decoder Implementation Results Conclusion
0 1 2 3 4 5 6 710
-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
Eb/N0(dB)
Bit
Err
or
Pro
ba
bili
ty
MS,S=0.6MS Split-Row,S=0.4MS Split-Row,S=0.3MS Split-Row,S=0.5MS Split-Row,S=1.0
Bit Error Rate Performance Comparison
Code length: 1536 bits
Message length: 1155 bits
Row weight: 16
Column weight:4
No. of iterations:15
MS: MinSum
MS Split-Row: MinSum-
Split Row
S: Scale factor
0.6dB
0 1 2 3 4 5 6 710
-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
Eb/N0(dB)
Bit
Err
or
Pro
ba
bili
ty
MS,S=0.6MS Split-Row,S=0.4MS Split-Row,S=0.3MS Split-Row,S=0.5MS Split-Row,S=1.0
Bit Error Rate Performance Comparison
Code length: 2048 bits
Message length: 1723 bits
Row weight: 32
Column weight:6
No. of iterations:15
MS: MinSum
MS Split-Row: MinSum-
Split Row
S: Scale factor
0.3dB
Outline
Introduction to LDPC Codes Split-Row Decoder Algorithm Error Rate Comparison Decoder Implementation Results Conclusion
A Full-Parallel Decoder Implementation
LDPC code example: Code length=1536 bits Message length=770 bits Row weight=6 Col weight=3
In Split-Row decoder: Total no. of wires between
each half is 3% of total wires.
Row processors in each half are 2.7 times smaller
Each row processor in each half is connected to only 3 column processors
1536 columnsrow weight=6
768
row
sco
l wei
gh
t =
3
Row+ColLeft
Row+ColRight
Row+Col
Col1
Col2
Col3
RowA
Col4
Col5
Col6
RowB
Sign BSign A
Col1
Col2
Col5
Col6
Row
Col3
Col4
Full Parallel Decoder Architecture
0.18 µm CMOS Technology, 6M layer
Split-Row, each half includes: 768 row processors 768 column processors
1536 Input Registers
1536 Output Registers
1536 Row+1536 ColProcessors
4.7
mm
2
4.7 mm2
1536 Input Registers
1536 Output Registers
Row+ColProcessors
Left
Row+ColProcessors
Right
SignA 0
SignB 0
SignA 767
SignB 767
4.1
mm
2
4.1 mm2
Standard MinSum
Split-Row vs. Standard Decoder
1536-bit (3,6) Quasi-cyclic LDPC code No. of quantization bits is set to 5 bits per message. For throughput computation no. of decoding iterations is set to 15. Reported numbers are based on chip implementation results in 0.18 µm
Avg.
Wire length
Chip size
Clk freq.
Throughput
CAD tool P&R
Run time
(mins)
Req.
Mem
(GB)
Standard MinSum
0.224 22.1 32 3.2 320 3.9
Split-Row
(This work)
0.142 16.8 53 5.4 193 2.3
Improvement
1.58× 1.3× 1.7× 1.7× 1.65× 1.7×
(mm2) (MHz) (Gbps)(mm)
Conclusion
Split-Row decoder method provides a significant reduction in circuit area
Results in: Reduced wire interconnect complexity Increased circuit area utilization Increased speed Simpler implementation
A good tradeoff between hardware complexity and error performance
Acknowledgments
Intel Corporation UC Micro NSF Grant No. 0430090 UCD Faculty Research Grant
100001010
010100001
001010100
001100010
100010001
010001100
H
'ijj'jh'j,h,'j
'ijij 'ij
'ij
minsign
1
1MinSum:
Rowprocessing
Colprocessing
Errorcorrection
Parity check
Rowprocessing
Colprocessing
Errorcorrection
Parity check
Initial value(received information from channel )
β
α
Message Passing (Row processing )
Message Passing (Column processing )
Rowprocessing
Colprocessing
Errorcorrection
Parity check
Rowprocessing
Colprocessing
Errorcorrection
Parity check
Initial value
β
α
j'j
'ijjij
100001010
010100001
001010100
001100010
100010001
010001100
H
λj is the received information.
0yiif0
0yiif1Vi
Rowprocessing
Colprocessing
Errorcorrection
Parity check
Rowprocessing
Colprocessing
Errorcorrection
Parity check
Initial value
β
α
λ1
100001010
010100001
001010100
001100010
100010001
010001100
H
α
α
y1
Rowprocessing
Colprocessing
Errorcorrection
Parity check
Rowprocessing
Colprocessing
Errorcorrection
Parity check
Initial value
β
α
8
7
6
5
4
3
2
1
0
100001010
010100001
001010100
001100010
100010001
010001100
^
^
^
^
^
^
^
^
^
v
v
v
v
v
v
v
v
v
H= 0 (Stop decoding)
≠0 (Repeat decoding)
LDPC Codes
An LDPC code is defined by a binary matrix called parity check matrix H. Rows define parity check equations (constrains) between encoded
symbols in a code word and columns define the length of the code. V is a valid code word if H٠Vt=0 Decoder in the receiver checks if the condition H٠Vt=0 is valid. Example : Parity check matrix for (9, 5) LDPC code, row weight=4,
column weight =2:
9
8
7
6
5
4
3
2
1
100001010
010100001
001010100
001100010
100010001
010001100
v
v
v
v
v
v
v
v
v
H ≠ 0 (There is error)= 0 (There is no error)
Row and Column Processor Architecture
Col. Proc.
+
+
+
1
3
1
3
in i
1
3
in i + i
Sign ( 1)
Min1
Min2
1
3 | 3 |
| 1|
Sign( 1)
SignA
SignB
Sign( 3)| 1|
| 3 |
Sign ( 3)
Min
2
Row Proc.
Comp
in1
in2
H
L
Comp
in1
in2
H
L
Sort_3
in1in2
H
MLin3
Sort_3
in1in2
H
MLin3
In2In1
In3
In5
In4
In6
Comp
in1
in2
H
L
Min1
Min2
Row+Col Procs. left
Row+Col Procs. Right
0 1 2 3 4 5 6 7 810
-7
10-6
10-5
10-4
10-3
10-2
10-1
Eb/N0(dB)
Bit
Err
or P
roba
bilit
y
Throughput=Clk*Code length/Imax P=cfv2
L
W
d C=keWL/d
What is the critical path and how you make sure that sign is computed correctly? Answer: the critical path is the sign computation, which depends on the
other side. The statistical timing analysis in place and route reports the slowest path delay, so it will make sure that the circuit works correctly.
Why the decoder chip becomes smaller even when you make it into half? Answer: first the size and total no of col processors doesn’t change. The
main benefit comes from the row processor which gets smaller than twice. The reason is that inside row processor there are different stages of comparators and they decrease more than twice when the number of inputs reduces to half.
You mentioned the design is power efficient but you didn’t report any power numbers Answer: For this paper we didn’t get the power numbers, but it can be
estimated from the fact the major energy comes from the wires (p=1/2cf^2) and we can say it’s scaled down linearly so it’s about 58% reduction.
Are there other works close to your design?
Which applications can tolerate this error performance loss? This a very broad question. It really depends on the power budget and
how much low you want to go on ber. What is the difference between viterbi and LDPC code? What is the difference between the turbo and LDPC? If don’t know the answer: I was not involved in That part of project but from what I know …. Review the previous works If asked why the chip figure is not square? If somebody asked: the way yu proposed didn’t decrease the no of wires how
do you say that it decreases the interconncet complexity. You should notice that we are talking about long wires. Because when
there is a large no of wires conincting one
Hard decision vs. soft: In hard decision decoding each received symbol is thresholded to yield
a single received bit as input to the decoding algorithm and messages passed between variable and check nodes as single bit only In soft decision decoding, multiple bits are used to represent each received symbol and the messages passed between variable and check node
How did you compute