Upload
jersey
View
58
Download
0
Tags:
Embed Size (px)
DESCRIPTION
VLSI Arithmetic Adders & Multipliers. Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel. Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design. - PowerPoint PPT Presentation
Citation preview
VLSI ArithmeticAdders & Multipliers
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Prof. V.G. Oklobdzija VLSI Arithmetic 2
Introduction• Digital Computer Arithmetic belongs to
Computer Architecture, however, it is also an aspect of logic design.
• The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way.
• Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation.
Prof. V.G. Oklobdzija VLSI Arithmetic 3
Basic Operations
• Addition
• Multiplication
• Multiply-Add
• Division
• Evaluation of Functions
• Multi-Media
Addition of Binary Numbers
Prof. V.G. Oklobdzija VLSI Arithmetic 5
Addition of Binary NumbersFull Adder. The full adder is the fundamental building block of most arithmetic circuits:
The sum and carry outputs are described as:
iiiiiiiiiiiiiiiiiii cbcabacbacbacbacbac 1
iiiiiiiiiiiii cbacbacbacbas
FullAdder
CinCout
si
ai bi
Prof. V.G. Oklobdzija VLSI Arithmetic 6
Addition of Binary Numbers
Propagate
Propagate
Generate
Generate
Inputs Outputs
ci ai bi si ci+1
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Prof. V.G. Oklobdzija VLSI Arithmetic 7
Full-Adder Implementation Full Adder operations is defined by equations:
iiiiiiiiiiiiiiiiii cpcbacbacbacbacbas
iiiiiiiiiiii cpgbacbacbac 1
One-bit adder could be implemented as shown
Carry-Propagate:and Carry-Generate gi
iii bap
iii bag cout c in
s i
a i b i
Prof. V.G. Oklobdzija VLSI Arithmetic 8
High-Speed Addition
iii cps
iiii cpgc 1
One-bit adder could be implemented more efficiently
because MUX is faster
iii bap iii bag
0
1s
b ia i
cout
s i
c in
Prof. V.G. Oklobdzija VLSI Arithmetic 9
The Ripple-Carry Adder
Prof. V.G. Oklobdzija VLSI Arithmetic 10
The Ripple-Carry AdderA0 B0
S0
Co,0Ci,0
A1 B1
S1
Co,1
A2 B2
S2
Co,2
A3 B3
S3
Co,3
(= Ci,1)FA FA FA FA
Worst case delay linear with the number of bits
tadder N 1– tcarry tsum+
td = O(N)
Goal: Make the fastest possible carry path circuit
From Rabaey
Prof. V.G. Oklobdzija VLSI Arithmetic 11
Inversion Property
A B
S
CoCi FA
A B
S
CoCi FA
S A B Ci S A B Ci
=
Co A B Ci Co A B Ci
=
From Rabaey
Prof. V.G. Oklobdzija VLSI Arithmetic 12
Minimize Critical Path by Reducing Inverting Stages
A0 B0
S0
Co,0Ci,0
A1 B1
S1
Co,1
A2 B2
S2
Co,2 Co,3FA’ FA’ FA’ FA’
A3 B3
S3
Odd CellEven Cell
Exploit Inversion Property
Note: need 2 different types of cellsFrom Rabaey
Prof. V.G. Oklobdzija VLSI Arithmetic 13
Ripple Carry Adder Carry-Chain of an RCA implemented using multiplexer from the standard cell library:
a i+1 b i+1 a i b ia i+2 b i+2
cout
c i+1 c i
s is i+1s i+2
c in
Critical Path
Oklobdzija, ISCAS’88
Prof. V.G. Oklobdzija VLSI Arithmetic 14
Manchester Carry-Chain Realization of the Carry Path
• Simple and very popular scheme for implementation of carry signal path
V dd
Carry out Carry in
Propagatedevice
Predischarge& kill device
Generatedevice
++++++++
V ddV ddV ddV ddV ddV ddV dd
Prof. V.G. Oklobdzija VLSI Arithmetic 15
Original DesignT. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers:
A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.
Prof. V.G. Oklobdzija VLSI Arithmetic 16
Manchester Carry Chain (CMOS)
P0
Ci,0
P1
G0
P2
G1
P3
G2
P4
G3 G4
VDD
Kilburn, et al, IEE Proc, 1959.
•Implement P with pass-transistors•Implement G with pull-up, kill (delete) with pull-down•Use dynamic logic to reduce the complexity and speed up
Prof. V.G. Oklobdzija VLSI Arithmetic 17
Pass-Transistor Realization in DPL A
A
B
B
C C
V C CS
S
XO R /XN O R M U LT IPLEX ER B U FFER
C C
M U LT IPLEX ER
V C CC
O
CO
B U FFER
V C C
V C C
O R /N O R
A N D /N A N D
A
A
B
B
A
A
B
B
Prof. V.G. Oklobdzija VLSI Arithmetic 18
Carry-Skip Adder
MacSorley, Proc IRE 1/61Lehman, Burla, IRE Trans on Comp, 12/61
Prof. V.G. Oklobdzija VLSI Arithmetic 19
Carry-Skip Adder
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,3Co,2Co,1Co,0Ci ,0
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,2Co,1Co,0Ci,0
Co,3
Mul
tipl
exer
BP=PoP1P2P3
Idea: If (P0 and P1 and P2 and P3 = 1)then Co3 = C0, else “kill” or “generate”.
Bypass
From Rabaey
Prof. V.G. Oklobdzija VLSI Arithmetic 20
Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups
G r G r-1
...
SN-k-1S N-1
a N -1bN -1 b N -k-1a N -k-1
S(r-1)k-1 S (r-2)k
G 1G o
...
Sk
S2k-1
a 2k-1b 2k-1 b kak
Sk-1
S0
...
...a (r-1)k b(r-1)k a (r-1)kb (r-1)k
...a k-1 b k-1 a0 b 0
...
C in
... ... ... ... ... ... ... ...
P r-1P r-2 P 1 P 0
C out + + + +
A N D
O RO RO R O R
A N DA N DA N D
critica l pa th , de lay =2(k-1)+(N /2-2)
Prof. V.G. Oklobdzija VLSI Arithmetic 21
Carry-Skip Adder
SKIPRCAd tN
tkt
2
212
N
tp
ripple adder
bypass adder
4..8
k
Prof. V.G. Oklobdzija VLSI Arithmetic 22
Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Prof. V.G. Oklobdzija VLSI Arithmetic 23
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
G 0
... ...
a0 b
0
...
...
ai
bi
aN-1
bN-1
S j
P m -2
C inC out
C ou
t
G 2G m -2G m -1G m
G 0G 1G 2G m -2G m -1G m
S N-1S i
S 0
P 2P 0P m -1P m
.....
G 1
P 1
C in
.....
aj b
j
Carry signal path
skip ing
ripp ling
Prof. V.G. Oklobdzija VLSI Arithmetic 24
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
1 13 34 4
5 56
=9
Any-point-to-any-point delay = 9 as compared to 12 for CSKA
Prof. V.G. Oklobdzija VLSI Arithmetic 25
Carry-chain block size determination for a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Prof. V.G. Oklobdzija VLSI Arithmetic 26
Delay Calculation for Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
P0
Ci,0
P1
G0
P2
G1
P3
G2
BP
G3
BP
Co,3
Delay model:
Prof. V.G. Oklobdzija VLSI Arithmetic 27
Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Variable Group Length
Oklobdzija, Barnes, Arith’85
321 cNcctd
Prof. V.G. Oklobdzija VLSI Arithmetic 28
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Variable Block Lengths
• No closed form solution for delay• It is a dynamic programming problem
Prof. V.G. Oklobdzija VLSI Arithmetic 29
Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Prof. V.G. Oklobdzija VLSI Arithmetic 30
Delay Comparison: Variable Block Adder
0
2
4
6
8
10
12
14
16
4 11 18 25 32 39 46 53 60
Size N
Del
ay
VBA- Multi-Level
CLA
VBA
Prof. V.G. Oklobdzija VLSI Arithmetic 31
Fan-Out Dependency
Prof. V.G. Oklobdzija VLSI Arithmetic 32
Fan-In Dependency
Prof. V.G. Oklobdzija VLSI Arithmetic 33
Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Prof. V.G. Oklobdzija VLSI Arithmetic 34
Prof. V.G. Oklobdzija VLSI Arithmetic 35
Carry-Lookahead Adder(Weinberger and Smith)
A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”,
National Bureau of Standards, Circ. 591, p.3-12, 1958.
Prof. V.G. Oklobdzija VLSI Arithmetic 36
Carry-Lookahead Adder(Weinberger and Smith)
1111
111
1112
)(
cppgpg
cpgpg
cpgc
iiiii
iiii
iiii
iiiiiiiiiiii cpgbacbacbac 1
iiiiiiiiii
iiiiiiii
iiii
cpppgppgpg
cppgpgpg
cpgc
1212122
11122
2223
)(
Prof. V.G. Oklobdzija VLSI Arithmetic 37
Carry-Lookahead Adder
jiiiiiiiiij cpppgppgpgG 123123233
iiiij ppppP 123
jiij cPGc 4)1(4
One gate delay to calculate p, g
One to calculateP and two for G
Three gate delaysTo calculate C4(j+1)
Compare that to 8 in RCA !
a i b i
Cin Cj
G jP j
a i+1 b i+1
g i+1p i+1 g i p i
a i+2 b i+2a i+3 b i+3
g i+1p i+1g i+1p i+1
C4(j+1)
C4j+1C4j+2C4j+3
P , G G roup
Prof. V.G. Oklobdzija VLSI Arithmetic 38
Carry-Lookahead Adder(Weinberger and Smith)
iiiiiiiiiij GPPPGPPGPG 123123233*G
iiiij PPPPP 123*
jkkj cPGc 4)1(4 **
P j
G* P*
C 4j+1
G jP j+1G j+1P j+3G j+3P j+2G j+2
C4jC4(j+1)
C 4j+2C 4j+3
Additional two gate delays
C16 will take a total of 5 vs. 32 for RCA !
Prof. V.G. Oklobdzija VLSI Arithmetic 39
32-bit Carry Lookahead Adder
C in
C out C in
C 4C 8C 12
C out
C 20C 24C 28
C in
C 16
a ib i
ind ividua l addersgenera ting: g i, p i,
and sum S i
C arry-lookahead b locks o f4-b its generating:
G i, P i, and C in fo r theadders
C arry-lookahead super- b locks o f4-b its b locks genera ting:
G * i, P * i, and C in fo r the 4-b itb locks
G roup producing fina lcarry C out and C 16
C ritica l pa th de lay = (fo r g i,p i)+2x2 (fo r G ,P )+3x2 (fo r C in)+1XO R - (fo r S um ) = appx. 12of de lay
Prof. V.G. Oklobdzija VLSI Arithmetic 40
Carry-Lookahead Adder(Weinberger and Smith: original derivation )
Prof. V.G. Oklobdzija VLSI Arithmetic 41
Carry-Lookahead Adder(Weinberger and Smith: original derivation )
Prof. V.G. Oklobdzija VLSI Arithmetic 42
Carry-Lookahead Adder (Weinberger and Smith)please notice the similarity with Parallel-Prefix Adders !
Prof. V.G. Oklobdzija VLSI Arithmetic 43
Carry-Lookahead Adder (Weinberger and Smith)please notice the similarity with Parallel-Prefix Adders !
Delay Optimized CLA
B. Lee, V. G. Oklobdzija
Journal of VLSI Signal Processing, Vol.3, No.4, October 1991
Prof. V.G. Oklobdzija VLSI Arithmetic 45
Delay Optimized CLA: Lee-Oklobdzija
‘91(a.) Fixed groups and levels
(b.) variable-sized groups, fixed levels
(c.) variable-sized groups and fixed levels
(d.) variable-sized groups and levels
Prof. V.G. Oklobdzija VLSI Arithmetic 46
Two-Levels of Logic Implementation of the Carry Block
Prof. V.G. Oklobdzija VLSI Arithmetic 47
Two-Levels of Logic Implementation of the Carry-Lookahead Block
Prof. V.G. Oklobdzija VLSI Arithmetic 48
Three-Levels of Logic Implementation of the Carry Block (restricted fan-in)
Prof. V.G. Oklobdzija VLSI Arithmetic 49
Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in)
Prof. V.G. Oklobdzija VLSI Arithmetic 50
Delay Optimized CLA: Lee-Oklobdzija ‘91
Delay: Two-level BCLA Delay: Three-level BCLA
Prof. V.G. Oklobdzija VLSI Arithmetic 51
Delay Optimized CLA: Lee-Oklobdzija ‘91
(a.) 2-level BCLA =8.5nS (b.) 3-level BCLA =8.9nS
Motorola: CLA Implementation Example
A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”,
Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.
Prof. V.G. Oklobdzija VLSI Arithmetic 53
Critical path in Motorola's 64-bit CLA
C ritica l pa th : A , B - G 0 - G 3:0 - G 15:0 - G 47:0 - C 48 - C 60 - C 63 - S 63
G4
P7
G0
P0
G1
P1
G2
P2
G3
P3
...
CARRYBLOCK
G8
P1
1
... G1
2
P1
5
... G1
6
P3
1
... G3
2
P4
7
... G4
8
P5
1
G6
0
P6
0
G6
1
P6
1
G6
2
P6
2
G6
3
P6
3
... G5
2
P5
5
... G5
6
P5
9
...
PG BLOCK
PG BLOCK
PG BLOCK
PG BLOCK
P,G
0
P,G
1:0
P,G
2:0
G3
:0
P3
:0
G7
:4
P7
:4
G1
1:8
P1
1:8
G1
5:1
2
P1
5:1
2
G3
:0
P3
:0
G7
:0
P7
:0
G1
1:0
P1
1:0
G1
5:0
P1
5:0
G1
5:0
P1
5:0
G3
1:1
6
P3
1:1
6
G3
1:0
P3
1:0
G4
7:3
2
P4
7:3
2
G4
7:0
P4
7:0
G5
1:4
8
P5
1:4
8
G5
5:5
2
P5
5:5
2
G5
9:5
6
P5
9:5
6
C6
4
G5
1:4
8
P5
1:4
8
G5
5:4
8
P5
5:4
8
G5
9:4
8
P5
9:4
8
P,G
60
P,G
61
:60
P,G
62
:60
G6
3:6
0
P6
3:6
0
G6
3:4
8
P6
3:4
8
G6
3:0
P6
3:0
C0
C4
C8
C1
2
C1
6
C3
2
C4
8
C1
6
C3
2
C4
8
C5
2
C5
6
C6
0
C6
3
PG BLOCK
C6
2
C6
1
Prof. V.G. Oklobdzija VLSI Arithmetic 54
Motorola's 64-bit CLA
conventional PG Block
Prof. V.G. Oklobdzija VLSI Arithmetic 55
Motorola's 64-bit CLA
Modified PG Block
Intermediate propagate signals Pi:0 are generated to speed-up C3
Ling’s Adder
Huey Ling, “High-Speed Binary Adder”
IBM Journal of Research and Development, Vol.5, No.3, 1981.
Prof. V.G. Oklobdzija VLSI Arithmetic 57
Ling AdderVariation of CLA:
Ling, IBM J. Res. Dev, 5/81
1 iiii GpgG
1 iii GpS
iii bap
iii bag
11 iiii HtgH
11 iiiiii HtgHtS
iii bat
iii bag
Ling’s equations:
Prof. V.G. Oklobdzija VLSI Arithmetic 58
Ling Adder
1 iiii GpgG
1
11
iiii
iiiiii
Gpgg
GpGggG
1 iiii GtgG11 iiii GtgH
Ling’s equation
Doran, Trans on Comp 9/88
Propagates informationon two bits
Prof. V.G. Oklobdzija VLSI Arithmetic 59
Ling Adder
01231232333 gtttgttgtgG
0121223
00121122233
gttgtgg
gtttgttgtgH
Conventional:
Ling:
Prof. V.G. Oklobdzija VLSI Arithmetic 60
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 61
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 62
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 63
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 64
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 65
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 66
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 67
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 68
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 69
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 70
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 71
Results:S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96
• 0.5u Technology
• Speed: 0.930 nS
• Nominal process, 80C, V=3.3V
ConditionalSum Adder
J. Sklansky, “Conditional-Sum Addition Logic”, IRE Transactions on Electronic
Computers, EC-9, p.226-231, 1960.
Prof. V.G. Oklobdzija VLSI Arithmetic 73
ConditionalSum Adder
Prof. V.G. Oklobdzija VLSI Arithmetic 74
ConditionalSum Adder
Carry-Select Adder
O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic Computers, June
1962, p.340-34
Prof. V.G. Oklobdzija VLSI Arithmetic 76
Carry-Select Adder
O.J. Bedrij, IBM Poughkeepsie, 1962
Prof. V.G. Oklobdzija VLSI Arithmetic 77
Carry-Select AdderAddition under assumption of Cin=0 and Cin =1.
Prof. V.G. Oklobdzija VLSI Arithmetic 78
Carry Select Adder:combining two 32-b VBAs in select mode
Delay =VBA32+ MUX
Addition Under Non-equal Signal Arrival Profile
Assumption
P. Stelling , V. G. Oklobdzija, "Design Strategies for Optimal Hybrid Final Adders in a Parallel Multiplier", special issue on VLSI Arithmetic, Journal of VLSI Signal Processing, Kluwer
Academic Publishers, Vol.14, No.3, December 1996
Prof. V.G. Oklobdzija VLSI Arithmetic 80
Signal Arrival Profile form the Parallel Multiplier Partial-Product Recuction Tree
Prof. V.G. Oklobdzija VLSI Arithmetic 81Oklobdzija, Villeger, IEEE Transactions on VLSI Systems, June, 1995
Prof. V.G. Oklobdzija VLSI Arithmetic 82
Oklobdzija and Villeger, IEEE Transactions on VLSI Systems, June, 1995
Prof. V.G. Oklobdzija VLSI Arithmetic 83
Prof. V.G. Oklobdzija VLSI Arithmetic 84
Prof. V.G. Oklobdzija VLSI Arithmetic 85
Prof. V.G. Oklobdzija VLSI Arithmetic 86
Prof. V.G. Oklobdzija VLSI Arithmetic 87
Prof. V.G. Oklobdzija VLSI Arithmetic 88
Prof. V.G. Oklobdzija VLSI Arithmetic 89
Prof. V.G. Oklobdzija VLSI Arithmetic 90
Performing Multiply-Add Operation in the Multiply Time
P. Stelling, V. G. Oklobdzija, " Achieving Multiply-Accumulate Operation in the
Multiply Time", Thirteenth International Symposium on Computer Arithmetic, Pacific
Grove, California, July 5 - 9, 1997.
Prof. V.G. Oklobdzija VLSI Arithmetic 92
Prof. V.G. Oklobdzija VLSI Arithmetic 93
Final Adder: Implementation
Prof. V.G. Oklobdzija VLSI Arithmetic 94
Final Adder: Implementation
Prof. V.G. Oklobdzija VLSI Arithmetic 95
Final Adder: Implementation
Prof. V.G. Oklobdzija VLSI Arithmetic 96
Final Adder: Implementation
Recurrence Solver Based Adders
Koggie and Stone, IEEE Trans on Computers, August 1973
Bilgory and Gajski, 18th DAC, 1981
Brent and Kung, IEEE Trans on Computers, March 1982
Prof. V.G. Oklobdzija VLSI Arithmetic 98
Recurrence Solver Based Adders• 1973, Koggie and Stone published a general
recurrence scheme for parallel computation• 1979, Brent and Kung published Tech. Report on
regular layout for parallel adders• 1980, Guibas and Vuillemin, developed a layout
scheme based on recurrence equation for addition• 1980, Ladner and Fisher published “parallel prefix
computation”, Jo of ACM• 1981, Bilgory and Gajski published a paper on
recurrence structures for automatic cell generation
Prof. V.G. Oklobdzija VLSI Arithmetic 99
Recurrence Solver Based Adders
They are based on recurrence equation for P,G
(what is new there since Weinberger ?!!):
Or: and
jiiiiiiiiij cpppgppgpgG 123123233
iiiij ppppP 123
11 iiii GpgG11 iii PpP
Prof. V.G. Oklobdzija VLSI Arithmetic 100
Recurrence Solver Based Adders C 16 C 13C 14C 15 C 7 C 1C 2C 3C 8 C 4C 5C 6C 12 C 9C 10C 11
(g1 , p
1 )
(g3 , p
3 )
(g4 , p
4 )
(g2 , p
2 )
(g5 , p
5 )
(g7 , p
7 )
(g8 , p
8 )
(g6 , p
6 )
(g9 , p
9 )
(g11 , p
11 )
(g12 , p
12 )
(g10 , p
10 )
(g13 , p
13 )
(g15 , p
15 )
(g16 , p
16 )
(g14 , p
14 )
generationof carry
generationof g i, p i
Prof. V.G. Oklobdzija VLSI Arithmetic 101
Carry-Lookahead Adder (Weinberger and Smith)
Just to remind you !please notice the similarity with Parallel-Prefix Adders !
Multiplexer Based Adder
Farooqui and Oklobdzija1999 Int’l Sym. on VLSI Technology, Taipei,
Taiwan, June 8-10, 1999
Prof. V.G. Oklobdzija VLSI Arithmetic 103
Multiplexer Based Adder
• Based on the realization that MUX circuit is faster than a logic gate due to its transmission gate implementation.
• Based on Carry-Lookahead method (W-S), or recurrence solver.
Prof. V.G. Oklobdzija VLSI Arithmetic 104
Multiplexer Based AdderA. A. Farooqui, V. G. Oklobdzija , F. Chechrazi, 1999 Int’l Sym. on VLSI
Technology, Taipei, Taiwan, June 8-10, 1999.
a3b2a2 b2a2b3a3
0 1
b0 a0 a1b0 a0 b1 a1
0 1
01
g01g23
p23
p3p1
g03p03
g03 p03
g3p
3
g2p
2
g1p
1
g0p
0
Prof. V.G. Oklobdzija VLSI Arithmetic 105
Multiplexer Based AdderA. A. Farooqui, V. G. Oklobdzija , F. Chechrazi, 1999 Int’l Sym. on VLSI
Technology, Taipei, Taiwan, June 8-10, 1999.
4 -b it M U Xb a se d g ro u p
c a r ry g e n .
4 -b it M U Xb a se d g ro u p
c a r ry g e n .
4 -b it M U Xb a se d g ro u p
c a r ry g e n .
4 -b it M U Xb a se d g ro u p
c a r ry g e n .
M U X an d N O RM U X an d N O R
M U X an d N A N DM U X an d N A N D
A 03B 03A 47B 47A 811B 811A 1215B 1215
G 0 -3
P 0 -3G 4 -7P 4 -7G 8 -11
P 8 -11G 1 2 -1 5
P 1 2 -1 5
C 3C 7C 11C 1 5
P 0 -7
G 0 -7
P 8 -1 5 G 8 -1 5
G 0 -11G 0 -1 5P 0 -11P 0 -1 5
B 811 A 811B 811A1215B1215 A1215B1215
S um 0-3
4 -b itS u m
4 -b itS u m
C in0C in1
S um 4-7
1 0
A 47B 47 A 47B 47
4 -b itS u m
4 -b itS u m
C in0C in1
S um 8-11
1 0
A 811
4 -b itS u m
4 -b itS u m
C in0C in1
S um 12-15
1 0
4 -b itS u m
C in0A 03B 03
AND
AND
P art_C ont
P art_C ont
CSA CSACSA
Prof. V.G. Oklobdzija VLSI Arithmetic 106
Multiplexer Based AdderA. A. Farooqui, V. G. Oklobdzija , F. Chechrazi, 1999 Int’l Sym. on VLSI
Technology, Taipei, Taiwan, June 8-10, 1999.
0 10 1
g0p1
p0
a0b0
0 1
01
a1b1
p2
g1
g1
0 1
01
a2b2
p3
g2
0 1
g2g1
Cin
Sum0Sum1Sum2Sum3
Prof. V.G. Oklobdzija VLSI Arithmetic 107
Multiplexer Based AdderA. A. Farooqui, V. G. Oklobdzija , F. Chechrazi, 1999 Int’l Sym. on VLSI
Technology, Taipei, Taiwan, June 8-10, 1999.
• Results in a very fast structure• 7-MUX delays for a 64-b adder• Delay using standard cell 0.25u, 2.5V, 25oC :
Adder Size (bits)
Delay
(pS)
8 625
16 665
32 710
64 903
Prof. V.G. Oklobdzija VLSI Arithmetic 108
DEC "Alpha" 21064 Adder
• Combination:– 8-bit tapered pre-discharged Manchester Carry
Chains, with Cin = 0 and Cin = 1
– 32-bit LSB Carry Lookahead Adder– 32-bit MSB Conditional-Sum Adder– Carry-Select on most significant 32-bits– Latches in the middle: pipelined addition
Prof. V.G. Oklobdzija VLSI Arithmetic 109
DEC "Alpha" 21064 Adder Latch
S witch
Latch
S witch
Latch
S witch
Latch
S witch
Latch
S witch
Latch
S witch
Latch
S witch
Latch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
Latch & X O R Latch & X O R Latch & X O R Latch & X O R
Latch & X O R Latch & X O RLatch & X O RLatch & X O R
PG K C ellPG K C ell PG K C ell PG K C ell PG K C ellPG K C ell PG K C ell PG K C ell
LookA head
C arryC hain
C arryC hain
C arryC hain
C arryC hain
C arryC hain
C arryC hain
C arryC hain
C arryC hain
M UX
10
10
10
10
10
10
10
C in
Input O perandsB yte 7
Input O perandsB yte 6
Input O perandsB yte 5
Input O perandsB yte 4
Input O perandsB yte 3
Input O perandsB yte 2
Input O perandsB yte 1
Input O perandsB yte 0
R esu lt R esu lt R esu lt R esu lt R esu lt R esu lt R esu lt R esu lt
Prof. V.G. Oklobdzija VLSI Arithmetic 110
DEC "Alpha" 21064 Adder: Results
• The first 200MHz processor
• Built using 0.75u technology
• V=3.3V, 30W
• Pipelined (two-latches) allowing 5nS throughput and 10nS latency
ConclusionVLSI Implementation of Addition
Prof. V.G. Oklobdzija VLSI Arithmetic 112
Conclusion: VLSI Implementation of Addition
• Currently, implementation parameters are not reflected in algorithms used for development
• Layout and wire delays effects are largely neglected and this is becoming intolerable in the next generation of technology
• Transistor sizing has a large effect which can out weight the algorithm
• There is a great disconnect between algorithm and implementation
• New rules and measures of goodness are needed
Multiplication
Parallel Multiplier Implementation
Prof. V.G. Oklobdzija VLSI Arithmetic 114
Multiplication Algorithm:
in
i
iin
i
i ryXryXXYP
1
0
1
0
0 p)(0
)(1)1(
jnjj Xyrp
rp for j=0,....,n-1
initially
p(n)=XY after n steps
Prof. V.G. Oklobdzija VLSI Arithmetic 115
Parallel MultipliersParallel Multipliers
Step 0
S tep 1
S tep 2
S tep 3
S tep 4
Prof. V.G. Oklobdzija VLSI Arithmetic 116
4:2 Compressor
4-2
I4 I1I2I3
C 0 C i
C S
Prof. V.G. Oklobdzija VLSI Arithmetic 117
Re-designed 4:2 Compressor with 3 XOR Delay
C inI1
I2
I3
I4
0
1
S
C
C out
118 VLSI Arithmetic Prof. V.G. Oklobdzija
A Method for Generation of FastParallel Multipliers
by
Vojin G. OklobdzijaDavid VillegerSimon S. Liu
Electrical and Computer EngineeringUniversity of California
Davis
Prof. V.G. Oklobdzija VLSI Arithmetic 119
Carry Propagate Adder
Vertical Slices
Horizontal Propagation
Carry and Sum Connection to the Final Adder
Partial Product Martix Divided into Vertical Compressor Slices
120 VLSI Arithmetic Prof. V.G. Oklobdzija
Idea !!!!!
Prof. V.G. Oklobdzija VLSI Arithmetic 121
A
B
Cin Sum
Carry
Signal Delays in a Full Adder(3,2) Counter
Fast Input
Fast Output
Prof. V.G. Oklobdzija VLSI Arithmetic 122
Three-Dimensional optimization Method: TDM
(Oklobdzija, Villeger, Liu, 1996)
Sum
Carry
A
BCin
Sum
Carry
A
BCin
I1
I2
I3
I4
C out
C in 3 XO Rdelays
Prof. V.G. Oklobdzija VLSI Arithmetic 123
A
B
Cin Sum
Carry
A
B
Cin Sum
Carry
Carry-Out
In 1
In 2In 3In 4
Carry In
Sum
Carry
Modified 4:2 Compressor with Optimal Interconnections of two Full Adders
3 XOR gates
Prof. V.G. Oklobdzija VLSI Arithmetic 124
Example of a12 X 12 Multiplication
1 0 1 1 0 1 0 1 0 1 0 01 0 1 1 0 1 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 01 0 1 1 0 1 0 1 0 1 0 0
1 0 1 1 0 1 0 1 0 1 0 00 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 0 1 0 1 0 1 0 01 0 1 1 0 1 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 01 0 1 1 0 1 0 1 0 1 0 0
Vertical Compressor Slice - VCS
(Partial Product for X*Y =B54 * B1B)
FA FA
FA
FA
0 0 1 1 0 1 0
FA
3-Dimensional View of Partial Product Reduction
Time
Final Adder
125 VLSI Arithmetic Prof. V.G. Oklobdzija
Method
Prof. V.G. Oklobdzija VLSI Arithmetic 126
sc
cina b
TDM ArrangementWorst Case
4
4
21
44
sc
cina b
24 3
6 1
1
6
3
Prof. V.G. Oklobdzija VLSI Arithmetic 127
Example of a Optimized Interconnection
sc
cina b
sc
cina b
sc
cina b
sc
cina b
bit (n-1) positionbit (n) position
2 xor0 xor
1 xor
3 xor3 xor
Example of a not Optimized Interconnection
sc
cina b
sc
cina b
sc
cina b
sc
cina b
bit (n-1) positionbit (n) position
2 xor0 xor
1 xor
4 xor3 xor
Example of Delay Optimization
Prof. V.G. Oklobdzija VLSI Arithmetic 128
The 9th Vertical Compressor Slice of a Multiplier
A B
C S
A B Cin
C S
A B Cin
C S
A B Cin
C S
A B Cin
C S
A B Cin
C S
A B Cin
C S
0 0 0 0 0 0 0 0 0 .5 1 1 2 3
.5 1 11 2 22 2.5
3 3 3.5 4
5 5
129 VLSI Arithmetic Prof. V.G. OklobdzijaComputer Tools
Prof. V.G. Oklobdzija VLSI Arithmetic 130
Algorithm for Automatic Generation of Partial Product Array.
Initialize:
Form 2N-1 lists Li ( i = 0, 2N-2 ) each consisting of pi elements where:
p i = i+1 for i £ N-1 and p i = 2N-1-i for i N
An element of a list Li ( j = 0,...,pi-1 ) is a pair: <nj, j>i where:
nj : is a unique node identifying name
j : is a delay associated with that node representing a delay of a signal arriving to the node nj with respect to some reference point.
For i = 0,1 and 2N-2: connect nodes from the corresponding lists Li directly to the CPA.
Prof. V.G. Oklobdzija VLSI Arithmetic 131
For i=2 to i=2N-3 {Partial Product Array Generation} Begin For if length of Li is even Then Begin If
sort the elements of Li in ascending order by the values of delay j connect an HA to the first 2 elements of Li starting with the slowest input
Ds =max {A+A-s, B+B-s} Dc =max {A+A-c, B+B-c} remove 2 elements from Li insert the pair <Ds,NetName> into Li insert the pair <Dc,NetName> into Li+1 decrement the length of Li increment the length of Li+1
End If;
132 VLSI Arithmetic Prof. V.G. Oklobdzija
while length of Li > 3 Begin While sort the elements of Li in ascending order by the values of delay j connect an FA to the first 3 elements of Li starting with the slowest input of the FA:
Ds =max {A+A-s, B+B-s, Ci+Ci-s} Dc = max {A+A-c, B+B-c, Ci+Ci-c}
remove 3 elements from Li insert the pair <Ds,NetName> into Li insert the pair <Dc,NetName> into Li+1 subtract 2 from the length of Li increment the length of Li+1
End While;
sort the elements of Li connect an FA to the last 3 nodes of Li connect the S and C to the bit i and i+1 of the CPA
End For;End Method;
Prof. V.G. Oklobdzija VLSI Arithmetic 133
Delays
Delay(S) = MAX {Delay(A) + DA-S, Delay(B) + DB-S, Delay(Cin) + DCin-S}
Delay(C) = MAX {Delay(A) + DA-C, Delay(B) + DB-C, Delay(Cin) + DCin-C}
In our case the delays in a FA are :
FAA S = FAB S = 2 XOR delays
FACin S = FAA C = FAB C = FACin C = 1 XOR delay.
In a HA:
HAA S = HAB S = 1 XOR delay while HAA C = HAB C = 0.5 XOR delay.
Prof. V.G. Oklobdzija VLSI Arithmetic 134
0
2
4
6
8
10
12
14
16
18
20
22
24
Del
ay (
XO
R L
evel
s)
0 20 40 60 80 100
Multiplier Width
Equivalent XOR Delays
TDM
Fadavi-Ardekani
9:2
4:2
3,2
135 VLSI Arithmetic Prof. V.G. Oklobdzija
Comparison between TDM and other representative schemes, in XOR levels.
Multiplier
Word-length
Wallace Tree [7] 4:2 Tree [11] Fadavi-
Ardekani [16]
TDM
3 2 2 2 2
4 4 3 3 3
6 6 6 5 5
8 8 6 7 5
9 8 8 7 6
11 10 9 8 7
12 10 9 8 7
16 12 9 10 8
19 12 12 11 9
24 14 12 12 10
32 16 12 13 11
42 16 15 14 12
53 18 15 15 13
64 20 15 16 14
95 20 18 17 15
Prof. V.G. Oklobdzija VLSI Arithmetic 136
oC, VCritical Path Delay [CMOS: Leff=1 , T=25 cc=5V]
N = 24-bits 4:2 Design 9:2 Design Fadavi-Ardekani TDM Design
Delay [nS] 14.0 13.0 11.7 10.5
137 VLSI Arithmetic Prof. V.G. Oklobdzija
Competing Approaches
Prof. V.G. Oklobdzija VLSI Arithmetic 138
Organization of Hitachi's DPL multiplier
4-2 4-2
4-2
4-2 4-2
4-2
4-2 4-2
4-2
4-2
4-2
4-2
4-2
54 b it 54 b it
B ooth 's E ncoder
108-b C LA A dder
108 b it
W alace 's tree
C onditiona l C arry S e lection (C C S )
Prof. V.G. Oklobdzija VLSI Arithmetic 139
Hitachi's 4:2 compressor structure
M UX
M UX
M UX
M UX
I4
I3
I1
I2
M UX
M UX
I1
I3
I4
C i
C i
C o
C
S
3 G ATES
Prof. V.G. Oklobdzija VLSI Arithmetic 140
DPL multiplexer circuit
L
H
M U X
D 0
D 1
D 0
D 1
S S
O U T
O U T
O U T
S
D 1
D 0
141 VLSI Arithmetic Prof. V.G. Oklobdzija
RECOMENDATIONS
Prof. V.G. Oklobdzija VLSI Arithmetic 142
Conclusion
1. The key to improving multiplier speed was in optimizing interconnections, not the compressor circuit (as it was believed for so long).
2. With the increase in wire delay it is important to make a connection between layout topology and algorithm for optimal interconnection of the PPRT.
3. Using one of the “fast adders” (CLA) as a final adder was acutally counterproductive. A simple final adder, but optimized for the signal arrival profile yields better results with less hardware.
4. It is possible to further optimize the PPRT and FA so that Multiply-Add operation (fused) can be performed in multiply time.
5. For the larger size multipliers / adders (as used in cryptography) the optimization procedures (described) yields even better results.
See: http://www.ece.ucdavis.edu/acsel/Publications.html
Prof. V.G. Oklobdzija VLSI Arithmetic 143
Read This !
1. E. Swartzlander, "Computer Arithmetic". Vol. 1&2, IEEE Computer Society Press, 1990.
2. K. Hwang, "Computer Arithmetic : Principles, Architecture and Design", John Wiley and Sons, 1979.
3. M. Ercegovac, “Digital Systems and Hardware/Firmware Algorithms”, Chapter 12: Arithmetic Algorithms and Processors, John Wiley & Sons, 1985.
4. A. Chandrakasan, W. Bowhill, F Fox, Editors, "Design of High Performance Microprocessors Circuits", IEEE Press, July 2000.
5. V. G. Oklobdzija, “High-Performance System Design: Circuits and Logic”, IEEE Press, July 1999.
Also: http://www.ece.ucdavis.edu/acsel/Publications.html
Prof. V.G. Oklobdzija VLSI Arithmetic 144
THE
END
Hollywood