Upload
lamcong
View
227
Download
1
Embed Size (px)
Citation preview
Ma d e w i t h Op e n Of f i c e . o r g 1
Computer Architecture 10
Fast Adders
Ma d e w i t h Op e n Of f i c e . o r g 2
Carry Problem
Addition is primary mechanism in Addition is primary mechanism in implementing arithmetic operationsimplementing arithmetic operationsSlow addition directly affects the total Slow addition directly affects the total performance of the computer performance of the computer Complex (and fast) addition schemes Complex (and fast) addition schemes increase the cost of the final implementationincrease the cost of the final implementationThe choice for adder structure must be tailor-The choice for adder structure must be tailor-made to the potential applicationsmade to the potential applications
Ma d e w i t h Op e n Of f i c e . o r g 3
RCA
k - digits
RCA – Ripple Carry AdderRCA – Ripple Carry AdderSimplest, Smallest and Slowest (SSS), but ...Worst-case carry-propagation is Θ(k)
(Computer Architecture I)
Ma d e w i t h Op e n Of f i c e . o r g 4
Bit-Serial RCA
VLSI implementation advantages:VLSI implementation advantages:small pin countreduced wire lengthhigh clock ratesmall spacelow power consumption
Alternative to pipeline unitsAlternative to pipeline unitsfor parallel processingfor parallel processing
X
Y
FAc
X+Y
shift
shift
Bit-serial RCAimplementation
Ma d e w i t h Op e n Of f i c e . o r g 5
Coping with Carry
Detect the end of carry propagationDetect the end of carry propagationasynchronous adders
Speed up the propagationSpeed up the propagation(CPA – Carry Propagate Adders)(CPA – Carry Propagate Adders)
CLA / CSLA / CSKA – Carry Lookahead / Select / Skip Adders
Limit the carry propagationLimit the carry propagationCSA – Carry Save AdderEstimate & Parallel Carry Adders
Eliminate the carry propagationEliminate the carry propagationCarry-free RNS operations
Ma d e w i t h Op e n Of f i c e . o r g 6
Adders Overview
1-bit addersHalf Adder
HAFull Adder
FABit Counter
(m,k)
CPA RCA
CSKACSLA
CLA
3-operandmulti-operandadders
CSA
AdderArray
AdderTree
Not complete
Ma d e w i t h Op e n Of f i c e . o r g 7
Asynchronous Adder
Based on RCA - detection of carry completionBased on RCA - detection of carry completionAverage longest chain of propagation ~ logAverage longest chain of propagation ~ log
22kk
Not suitable for synchronous processingNot suitable for synchronous processing
from other bit positionsComplete
ci
ai+b
i
ai*b
i
carry (c, d) fromprevious stage
di
ai*b
i
ai+b
i
ci+1
di+1
(c,d) – extended carry-----------------------------------(0,0) – Carry not yet known(0,1) – Carry 0(1,0) – Carry 1
Ma d e w i t h Op e n Of f i c e . o r g 8
CLA
∑ ∑ ∑ ∑
k - digits
CLA – Carry Lookahead AdderCLA – Carry Lookahead AdderFast but ComplexWorst-case carry propagation is Θ(log k)
LA Logic - 1st level
4∑
LA Logic - 2nd level
4∑4∑
LALALA
(Computer Architecture I)
Ma d e w i t h Op e n Of f i c e . o r g 9
Carry Select Adder (CSLA)
The oldest logarithmic-time adderThe oldest logarithmic-time adderThe additions are performed in parallel The additions are performed in parallel according to alternative scenarios: carry= 0 or 1according to alternative scenarios: carry= 0 or 1The final selection of results is made withThe final selection of results is made with the computed value of carry the computed value of carryCLSAs have CLSAs have ΘΘ(log k) addition time, but with (log k) addition time, but with high complexity of hardwarehigh complexity of hardware
0
11
0
1
0c
0
Ma d e w i t h Op e n Of f i c e . o r g 10
One-Level CSLA
k/2 RCA
k/2 RCA
k/2 RCA
Mux 2→1 (k/2 buses)
k/2
k/2
k/2
½k−1 0
k−1 ½k
k−1 ½k
0
1
k/2
c0
cout
+ High k/2 bits of result Low k/2 bits of result
ck/2
Ma d e w i t h Op e n Of f i c e . o r g 11
Block Propagation in CSLA
k/4 RCAk/4 RCA
k/4 RCAk/4 RCA
k/4 RCAk/4 RCA
k/4 RCA1
0
10
10
c0
ck/4
ck/2
Res. k-1...¾k ¾k-1...½k ½k-1...¼k ¼k-1...0
c3k/4
ck
k-1 ¾k ¾k-1 ½k ½k-1 ¼k ¼k-1 0
Not optimal due to block-carry propagation delayNot optimal due to block-carry propagation delay
Ma d e w i t h Op e n Of f i c e . o r g 12
Two-Level CSLA
k/4 RCAk/4 RCA
k/4 RCAk/4 RCA
k/4 RCAk/4 RCA
k/4 RCA1
01
01
0c
0
ck/4
ck/2
Result k-1...½k ½k-1...¼k ¼k-1...0
c3k/4
ck
k-1 ¾k ¾k-1 ½k ½k-1 ¼k ¼k-1 0
c3k/4
Block-carry propagation is fast, but design complexBlock-carry propagation is fast, but design complex
Ma d e w i t h Op e n Of f i c e . o r g 13
Propagation Chains
worst-case
carr
y pr
opag
atio
n
Carries can be generated, propagated or absorbedCarries can be generated, propagated or absorbedPropagation chains are evaluated in parallelPropagation chains are evaluated in parallelHow to speed up the worst-case propagation?How to speed up the worst-case propagation?
Ma d e w i t h Op e n Of f i c e . o r g 14
Carry Skip Adder (CSKA)
FA FA FA FA
p = pi*p
i+1*p
i+2*p
i+3
cic
i +1
4-bit RCA
Carry skip
p
Carry-in is Carry-in is propagatedpropagatedthrough n-stagesthrough n-stagesif p=1if p=1Propagate Propagate condition pcondition pis easilyis easilycomputablecomputable
propagation conditionp
i=a
i+b
i
Ma d e w i t h Op e n Of f i c e . o r g 15
CSKA
16-bit CSKA Adder16-bit CSKA Adder
The longest delay due to carry propagation:The longest delay due to carry propagation:propagation through bits 1-3 and ORskip bits 4-11propagation through bits 12-14
(block 0 and last do not contribute to carry-propagation delay)
4-bit RCA
Carry skip
p4-bit RCA
Carry skip
p4-bit RCA
Carry skip
p4-bit RCA
Carry skip
pc
0c
4c
8c
12c
16
carry-propagatecarry-propagate carry-skip
worst-case
Ma d e w i t h Op e n Of f i c e . o r g 16
CSKA – Delay Analysis
Assume: 1 block skip-delay = 1 bit carry-propagationAssume: 1 block skip-delay = 1 bit carry-propagationk – bits, b – bits in skip-block (fixed-size)T
fixCSKA – carry-propagation delay in fixed-block size CSKA
(number of stages the carry must be propagated through)
TfixCSKA
= (b − 1) + 0.5 + (k/b − 2) + (b − 1) = 2b + k/b -3.5in block 0 + OR gate + all skips + in last block
e.g. 32-bit: TfixCSKA
= 12.5 (b=4, k=32)
4-bit RCA
Carry skip
p4-bit RCA
Carry skip
p4-bit RCA
Carry skip
p4-bit RCA
Carry skip
pc
0c
4c
8c
12c
16
Ma d e w i t h Op e n Of f i c e . o r g 17
Fixed-Size Skip Blocks
Optimal size of fixed-size skip blocks:Optimal size of fixed-size skip blocks:dT
fixCSKA /db = 0
d(2b + k/b -3.5)/db = 2 – k/b2 = 0
bopt
= (k/2)1/2 → TfixCSKA-opt
= 2(2k)1/2 − 3.5
e.g. 16-bit adder → be.g. 16-bit adder → boptopt ≈ 3, T ≈ 3, T
fixCSKAfixCSKA≈ 8≈ 8
e.g. 32-bit adder → be.g. 32-bit adder → boptopt = 4, T = 4, T
fixCSKAfixCSKA= 12.5= 12.5
e.g. 64-bit adder → be.g. 64-bit adder → boptopt ≈ 6, T ≈ 6, T
fixCSKAfixCSKA≈ 19≈ 19
Ma d e w i t h Op e n Of f i c e . o r g 18
Variable-Size Skip Blocks
Variable-size skip blocks shorten the propagationVariable-size skip blocks shorten the propagationOptimal configuration is to have the longest block Optimal configuration is to have the longest block in the middle and the shortest at both endsin the middle and the shortest at both ends
t – number of blocks (even number)b – bits in smallest skip block
k = t*(b + t/4 − 1/2)b = k/t − t/4 + 1/2
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
b b+1 ... b+t/2-1 b+t/2-1 ... b+1 b
Ma d e w i t h Op e n Of f i c e . o r g 19
Variable-Size Skip Blocks
TvarCSKA
– carry-propagation delay in fixed-block size CSKA(number of stages the carry must be propagated through)
TvarCSKA
= 2*(b−1) + 0.5 + (t−2) = 2k/t + t/2 − 2.5
Optimal size of fixed-size skip blocks:dT
varCSKA /db = 0 → −2k/t2 + 1/2 = 0
topt
= 2 k1/2 → TvarCSKA-opt
= 2 k1/2 − 2.5
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
t−2 0.5 + b−1b−1
Ma d e w i t h Op e n Of f i c e . o r g 20
Variable-Size Skip Blocks
ttoptopt = 2 = 2 kk1/21/2
TTvarCSKA-optvarCSKA-opt= 2 k= 2 k1/21/2 − 2.5 − 2.5
bboptopt ≈ 1 ≈ 1
e.g. 16-bit adder → te.g. 16-bit adder → toptopt = 8, T = 8, T
fixCSKAfixCSKA≈ 5.5≈ 5.5
e.g. 32-bit adder → te.g. 32-bit adder → toptopt ≈ 12, T ≈ 12, T
fixCSKAfixCSKA≈ 9≈ 9
e.g. 64-bit adder → te.g. 64-bit adder → toptopt = 16, T = 16, T
fixCSKAfixCSKA≈ 13.5≈ 13.5
Ma d e w i t h Op e n Of f i c e . o r g 21
Multilevel CSKA
First-level skip blocks get propagate-condition First-level skip blocks get propagate-condition signal from block adderssignal from block addersSecond-level skip blocks get propagate-condition Second-level skip blocks get propagate-condition signal from first-level skip blockssignal from first-level skip blocksCarry can be propagated over group of skip blocksCarry can be propagated over group of skip blocks
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip
Adder
Skip Skip Skip Skip
Skip Skip
Ma d e w i t h Op e n Of f i c e . o r g 22
Multilevel CSKA
Multilevel structures are results of complex Multilevel structures are results of complex optimizations and usually are not regularoptimizations and usually are not regular
first levelskip blocks
second levelskip blocks
adders
Ma d e w i t h Op e n Of f i c e . o r g 23
Hybrid Fast Adders
Combination of RCA, CLA, CSKA and others Combination of RCA, CLA, CSKA and others allows to satisfy various criteria:allows to satisfy various criteria:
high performancecost-effectivenesslow power consumption
Ma d e w i t h Op e n Of f i c e . o r g 24
Example – CSLA+CSKA+CLA
CSKACSKA
CSKACSKA
CSKACSKA
CSKA1
0
10
10
CL Logic
Ma d e w i t h Op e n Of f i c e . o r g 25
Multioperand Adders
Applications in multiplication, vector and Applications in multiplication, vector and matrix arithmetics and othersmatrix arithmetics and others
Adding n-numbers of k-bits, total sum has k+log2n bits
● ● ● ● x ● ● ● ● y ------- ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●---------------● ● ● ● ● ● ● ● x*y
● ● ● ● x ● ● ● ● y ------- ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●---------------● ● ● ● ● ● ● ● x*y
Ma d e w i t h Op e n Of f i c e . o r g 26
Serial Implementation
Latency ΘLatency Θ(n log k)(n log k)faster than linear dependence on number of operandslogarithmic dependence for scaling the operand size
Par
tial s
um
k-bits
k+log n bits
n-operandsFast adder
Θ(log k)Shift
register
Ma d e w i t h Op e n Of f i c e . o r g 27
Tree Implementation with CPA
Tree of 2-operand adders (RCA are best here!)Tree of 2-operand adders (RCA are best here!)For n-operands, n-1 adders are needed (costly)For n-operands, n-1 adders are needed (costly)Latency ΘLatency Θ(k+log n) – scales well with n(k+log n) – scales well with n
RCA RCA RCA
RCA RCA
RCA
n k-bit operands
k+3 bit result
k+1 k+1 k+1
k+2 k+2
Ma d e w i t h Op e n Of f i c e . o r g 28
Look into Tree of RCAs
HA
HA
FAFA
FA
single RCA at level i
single RCA at level i+1
Adders at higher levels need not wait for full Adders at higher levels need not wait for full carry propagation from lower-level adderscarry propagation from lower-level addersAll adders start with just one FA-delay after All adders start with just one FA-delay after previous levelprevious level
Ma d e w i t h Op e n Of f i c e . o r g 29
Tree Implementation with CSA
Tree of 3-operand carry-save addersTree of 3-operand carry-save addersCSA reduce n-operands to 2-operands, CSA reduce n-operands to 2-operands, ΘΘ(log n)(log n)Final fast CPAFinal fast CPA is needed, is needed, ΘΘ(log k)(log k)Latency ΘLatency Θ(log n + log k) – scales well with n & k(log n + log k) – scales well with n & k
CSA CSA
CSA
CSA
CSA
n k-bit operands
CPA