View
2
Download
0
Category
Preview:
Citation preview
1
EE241 - Spring 2004Advanced Digital Integrated Circuits
Borivoje Nikolic
Lecture 19Advanced Adder Designs
2
Announcements
Feedback on midterm mailed to youHomework #3 due todayHomework #4 posted
2
3
Arithmetic Circuits
Chapter 11, Rabaey, 2nd ed.Selected journal publicationsBooks:
K. Hwang, "Computer Arithmetic : Principles, Architecture and Design", John Wiley and Sons, 1979.E. E. Swartzlander, “Computer Arithmetic” Vol. 1 & 2, IEEE Computer Society Press, 1990.S.Waser, M.Flynn, “Introduction to Arithmetic for Digital Systems Designers”, Holt, Rinehart and Winston 1982.I. Koren, Computer Arithmetic Algorithms,” Brookside 1998.B. Parhami, “Computer Arithmetic,” Oxford 2000.High-Speed VLSI Arithmetic Units: Adders and Multipliers, by V. Oklobdzija in Chandrakasan et al.
4
A B
Cout
Sum
Cin Fulladder
Full Adder
3
5
The Ripple-Carry Adder
A0 B0
S0
Co,0Ci,0
A1 B1
S1
Co,1
A2 B2
S2
Co,2
A3 B3
S3
Co,3
(= Ci,1)FA FA FA FA
Worst case delay linear with the number of bits
tadder N 1–( )tcarry tsum+≈
td = O(N)
Goal: Make the fastest possible carry path circuit
6
Inversion Property
A B
S
CoCi FA
A B
S
CoCi FA
4
7
Taking Out Inverters
A0 B0
S0
Co,0Ci,0
A1 B1
S1
Co,1
A2 B2
S2
Co,2 Co,3FA’ FA’ FA’ FA’
A3 B3
S3
Odd CellEven Cell
Exploit Inversion Property
8
The Mirror Adder
VDD
Ci
A
BBA
B
A
A BKill
Generate"1"-Propagate
"0"-Propagate
VDD
Ci
A B Ci
Ci
B
A
Ci
A
BBA
VDD
SCo
24 transistors
5
9
Mirror Adder Cell
CiA B
VDD
GND
B
Co
A Ci Co Ci A B
S
10
Sizing Mirror Adder
VDD
VDDA6
6 B
6 Ci
Ci
3 Ci
3 A
3 B
A4 4 4
CiB
A2 2 2
CiB
4
2
VDD
A12 12 4
BB
A6 6 2B
B
12 4
6 2
Generate
0-Propagate
1-Propagate
Kill
A
A
Co S
Fanout (effective) ~2
6
11
Full Adder Implementation
Standard CMOS Multiplexer-based
Courtesy of IEEE Press, New York. 2000
12
TG-Based Full Adder
A
B
P
Ci
VDD
A A
VDD
A
P
AB
Ci
Sum
Carry
generation
generation
A
Ci
VDD
VDD
Ci
Ci
Co
S
P
P
P
P
P
7
13
Full Adder in DPL
14
Manchester Carry Chain
CoCi
Gi
Ki
Pi
Pi
VDD
CoCi
Gi
Pi
VDD
φ
φ
Static Dynamic
8
15
Manchester Carry Chain
Kilburn, et al, IEE Proc, 1959.
•Implement P with pass-transistors•Implement G with pull-up, kill (delete) with pull-down•Use dynamic logic to reduce the complexity and speed up
G2
φ
C3
G3
Ci,0
P0
G1
VDD
φ
G0
P1 P2 P3
C3C2C1C0
16
Sizing Manchester Carry Chain
R1
C1
R2
C2
R3
C3
R4
C4
R5
C5
R6
C6
Out
M0 M1 M2 M3 M4MC
Discharge Transistor
1 2 3 4 5 6
tp 0.69 Ci Rjj 1=
i∑
i 1=
N∑=
1 1.5 2.0 2.5 3.0k
5
10
15
20
25
Spe
ed
1 1.5 2.0 2.5 3.0k
0
100
200
300
400
Are
a
Speed (normalized by 0.69RC) Area (in minimum size devices)
Tapering?
9
17
Sizing Manchester Carry Chain
Delay equation
Delay is quadratic with NProgressive sizing should help?
( )RC
NNRCt
N
i
i
jjip ∑ ∑
= =
+=
=
1 1 21
69.069.0
18
Sizing Manchester Carry Chain
Stick Diagram
Pi + 1 Gi + 1 φ
Ci
Inverter/Sum Row
Propagate/Generate Row
Pi Gi φ
Ci - 1Ci + 1
VDD
GND
( ) ( ) ( )WCCWRNN
RCNN
t fixp ⋅++
=+
=2
169.0
21
69.0
Cfix – fixed capacitanceat the node ( pull-down, pull-up diffusions, metal,+ inverter ~15fFC ~ 2fF/µmR ~ 10kΩ µmWhen CW > Cfixsmall improvements withsizing, Loading of the input stage
10
19
Manchester Carry Chain
Length of chain is limited to k = 4-8Standard solution – add invertersThe overall N-bit adder delay is a sum of N/k segments (linear)
20
Carry-Skip Adder
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,3Co,2Co,1Co,0Ci,0
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,2Co,1Co,0Ci,0
Co,3
Mul
tiple
xer
BP=PoP1P2P3
Idea: If (P0 and P1 and P2 and P3 = 1)then Co3 = C0, else “kill” or “generate”.
Bypass (Skip)
MacSorley, Proc IRE 1/61Lehman, Burla, IRE Trans on Comp, 12/61
11
21
Carry-Skip Adder
Setup
CarryPropagation
Sum
Setup
CarryPropagation
Sum
Setup
CarryPropagation
Sum
Setup
CarryPropagation
Sum
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
Ci,0
Critical Path
( ) ( ) RCASKIPRCAd tktkN
tkt 121 −+
−+−=
For N-bit adder with k-bit groups
22
Carry-Skip Adder
Courtesy of IEEE Press, New York. 2000
12
23
Carry-Skip Adder
( ) SKIPRCAd tkN
tkt
−+−= 212
Critical path delay with constant groups
N
tp
ripple adder
bypass adder
4..8
24
Carry-Skip Adder
Variable Group Length
Oklobdzija, Barnes, Arith’85
321 cNcctd ++=
13
25
Carry-Skip Adder
Courtesy of IEEE Press, New York. 2000
26
Carry-Skip Adder
Variable Block Lengths
Oklobdzija, Barnes, Arith’85
14
27
Manchester Chain with Carry-Skip
P0
Ci,0
P1
G0
P2
G1
P3
G2
BP
G3
BP
Co,3
Delay model:
28
PTL with SA-F/F Implementation
Matsui,JSSC 12/94
15
29
Conditional Sum Adders
Sklansky,Trans on Comp6/60
iii yxs ⊕=0
iii yxs ⊕=1
iii yxc ⋅=0
iii yxc +=1
30
Conditional Sum Adders
16
31
TG Conditional Sum
Conditional CellConditional Sum Adder
2-way MUXes
Rothermel, JSSC 89
32
TG Conditional Sum
l Serial connection of transmission gates l Chain length = 1+log2n
Signal propagation
17
33
DPL Conditional Sum
CLA“Conditional carry select”
34
DPL Conditional Sum
Block Conditional Sums
18
35
Carry-Select Adder
Setup
"0" Carry Propagation
"1" Carry Propagation
Multiplexer
Sum Generation
Co,k-1 Co,k+3
"0"
"1"
P,G
Carry Vector
36
Carry Select Adder: Critical Path
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
S0-3 S4-7 S8-11 S12-15
Co,15Co,11Co,7Co,3Ci,0
19
37
Linear Carry Select
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
S0-3 S4-7 S8-11 S12-15
Ci,0
(1)
(1)
(5)(6) (7) (8)
(9)
(10)
(5) (5) (5)(5)
38
Square Root Carry Select
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13
S0-1 S2-4 S5-8 S9-13
Ci,0
(4) (5) (6) (7)
(1)
(1)
(3) (4) (5) (6)
Mux
Sum
S14-19
(7)
(8)
Bit 14-19
(9)
(3)
20
39
Carry-Lookahead Adders
Adder treesRadix of a treeMinimum depth treesSparse trees
Logic manipulationsConventional vs. LingStack height limiting
40
Propagate and Generate Signals
Define 3 new variable which ONLY depend on ai, bi
Generate (gi) = aibi
Propagate (pi) = ai + bi (could be XOR as well)
Delete = ai bi
Can also derive expressions for s and cout based on di
and pi
( )iniii
iniiiiout
cgpgs
cpgpgc
⊕=
+=
),(
,
21
41
A0,B0 A1,B1 AN-1,BN-1...
Ci,0 P0 Ci,1 P1Ci,N-1 PN-1
...
Carry Lookahead Adder
Weinberger, Smith, 1958.
42
Lookahead Adder
1−+= iiii cpgc
Looakahead Equations
( )1111
111
111
−+++
−++
+++
++=++=
+=
iiiiii
iiiii
iiii
cppgpg
cpgpg
cpgcPosition i:
Position i + 1:
Carry exists if:- generated in stage i + 1- generated in stage i and propagated through i + 1- propagated through both i and i + 1
22
43
Lookahead Adder
• Unrolling of carry recurrence can be continued• If unrolled to level k, resulting in two-level AND-OR
structure• AND Fan-In = k + 1, OR Fan-In = k + 1• k + 1 transistors in the MOS stack• Limits k to 2 – 4 • Later referred to as a radix of an adder
44
Lookahead Adder
VDD
P3
P2
P1
P0
G3
G2
G1
G0
Ci,0
Co,3
Mirror Implementation
23
45
Block Lookahead
1123123
1232334
−++++++
+++++++
++
++=
iiiiiiiii
iiiiiii
cppppgppp
gppgpgcFourth bit carry:
iiiiiiiiiiii gpppgppgpgG 1231232333, ++++++++++ +++=
iiiiii ppppP 1233, ++++ =
13,3,4 −+++ += iiiiii cPGc
Block generate and block propagate:
46
Block Lookahead
Can create groups of groups, or ‘super-groups’:
jjjjjjjjjjjj GPPPGPPGPGG 123123233*
:3 ++++++++++ +++=
jjjjjj pPPPP 123*
:3 ++++ =
Delay is Nctd log1=
24
47
Block Lookahead
From Oklobdzija
48
Carry Lookahead Trees
Co 0, G0 P0Ci 0,+=
Co 1, G1 P1 G0 P1P0 Ci 0,+ +=
Co 2, G2 P2G1 P2 P1G0 P+ 2 P1P0C i 0,+ +=
G2 P2G1+( )= P2P1( ) G0 P0Ci 0,+( )+ G 2:1 P2:1Co 0,+=
Can continue building the tree hierarchically.
25
49
Tree Adders
lmG ppP ⋅=
lmmG gpgG ⋅+=
m – more significantl – less significant
Start from the input P, G, and continue up the tree2-bit groups, then 4-bit groups, …
( ) ( ) ( )lmlmmllmm ppgpgpgpgpg ⋅⋅+=•= ,,,),(
Kogge, Stone, Trans on Comp,’73 Radix 2
50
Tree Adders: Radix 2
16-bit radix-2 Kogge-Stone Tree
(A0,
B0)
(A1,
B1)
(A2,
B2)
(A3,
B3)
(A4,
B4)
(A5,
B5)
(A6,
B6)
(A7,
B7)
(A8,
B8)
(A9,
B9)
(A10
, B10
)
(A11
, B11
)
(A12
, B12
)
(A13
, B13
)
(A14
, B14
)
(A15
, B15
)
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
26
51
Tree Adders: Radix 4
(a0,
b0)
(a1,
b1)
(a2,
b2)
(a3,
b3)
(a4,
b4)
(a5,
b5)
(a6,
b6)
(a7,
b7)
(a8,
b8)
(a9,
b9)
(a10
, b10
)
(a11
, b11
)
(a12
, b12
)
(a13
, b13
)
(a14
, b14
)
(a15
, b15
)
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
16-bit radix-4 Kogge-Stone Tree
52
Sparse Trees
(a0,
b0)
(a1,
b1)
(a2,
b2)
(a3,
b3)
(a4,
b4)
(a5,
b5)
(a6,
b6)
(a7,
b7)
(a8,
b8)
(a9,
b9)
(a10
, b1
0)
(a11
, b1
1)
(a12
, b1
2)
(a13
, b1
3)
(a14
, b1
4)
(a15
, b1
5)
S1
S3
S5
S7
S9
S11
S13
S15
S0
S2
S4
S6
S8
S1
0
S1
2
S1
4
16-bit radix-2 sparse tree with sparseness of 2 (Han-Carlson)
27
53
Full vs. Sparse TreesSparse trees have less transistors, wires
Less powerLess input loadingRecovering missing carries
Ripple (extra gate delay)Precompute (extra fanout)
Complex precompute can get into the critical path
Adder Delay [FO4]
Tot
al T
rans
isto
r Wid
th [u
nit w
idth
/bit]
300
400
500
600
700
800
900
1000
7 9 11 13 15 17
Radix-4 Kogge-Stone
Radix-4 2-Sparse
Radix-4 4-Sparse
-23.3%
54
Tree Adders: Other Trees
Ladner-Fischer
(A0,
B0)
(A1,
B1)
(A2,
B2)
(A3,
B3)
(A4,
B4)
(A5,
B5)
(A6,
B6)
(A7,
B7)
(A8,
B8)
(A9,
B9)
(A10
, B10
)
(A11
, B11
)
(A12
, B12
)
(A13
, B13
)
(A14
, B14
)
(A15
, B15
)
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
28
55
Other Sparse Trees
Mathew, VLSI’02
56
Ling Adder
Variation of CLA
Ling, IBM J. Res. Dev, 5/81
1−⋅+= iiii GpgG
1−⊕= iii GpS
iii bap ⊕=
iii bag ⋅=
11 −− ⋅+= iiii HtgH
11 −−+⊕= iiiiii HtgHtS
iii bat +=
iii bag ⋅=
Ling’s equations
29
57
Ling Adder
1−⋅+= iiii GpgG
1−⋅+= iiii GtgG 11 −− ⋅+= iiii GtgH
Ling’s equation shifts the index ofpseudo carry
Doran, Trans on Comp 9/88
Propagates informationon two bits
Conventional CLA:
Also:
58
Ling Adder
01231232333 gtttgttgtgG +++=
0121223
00121122233
gttgtgg
gtttgttgtgH
+++=+++=
Conventional radix-4
Ling radix-4
Reduces the stack height (or width)Reduces input loading
30
59
Ling vs. CLA
10
15
20
25
30
35
40
45
50
55
60
6 7 8 9 10 11
Delay [FO4]
En
erg
y [p
J]
R2 Ling
R2 CLA
R4 Ling
R4 CLA
R. Zlatanovici, ESSCIRC’03
60
Static vs. Dynamic
8
13
18
23
28
33
38
5 7 9 11 13 15
Delay [FO4]
En
erg
y [p
J]
Compound Domino R2Domino R2Domino R4Static R2
31
61
Stack Height Limiting
Transform conventional G, P
Park, VLSI Circ’00
62
HP Adder
Naffziger, ISSCC’96
01234 ppppi =
32
63
HP Adder – Differential Domino
Carry rippleSum select
64
Hybrid Adders
Dobberpuhl, JSSC 11/92 DEC Aplha 21064
33
65
DEC Adder
Combination:8-bit tapered pre-discharged Manchester carry chains, with Cin = 0 and Cin = 132-bit LSB carry-lookahead32-bit MSB conditional sum adderCarry-select on most significant bitsLatch-based timing
Recommended