Upload
bharat-biyani
View
266
Download
4
Embed Size (px)
DESCRIPTION
Designed a 21b X 21b multiplier using Booth-2 algorithm by constructing schematic of decoder, partial product generation & compression and Adder (Carry Look Ahead). Performed Hspice simulation to verify the correct functionality, library characterization of assembled Netlist using Siliconsmart ACE, RTL synthesis of generated library. Timing and power consumed is analyzed through static timing analysis using Synopsys Primetime.
Citation preview
21b x 21b Multiplier Design
EE7325 Page 1
Project Description
• 21b X 21b multiplier design with emphasis on speed. • The schematic is designed in IBM 130 nm process technology • Input operands are positive • Design is verified using Hspice • Siliconsmart ACE is used to characterize the cells • Power and delay found from Primetime • The design uses Booth-2, the partial products are compressed and a carry propagate adder
is used.
Introduction
Multiplication is a heavily used arithmetic operation that is prominently used in signal processing and scientific applications. Multiplication is hardware intensive, and the main criteria of interest are higher speed, lower cost, and less VLSI area. The main concern in classic multiplication often realized by a number of cycles of shifting and adding, is to speed up the underlying multi-operand addition of partial products. This algorithm can be slow if there are many partial products because the output must wait until each sum is performed. Hence we use Booth’s algorithm which cuts the number of required partial products in half in turn reducing the hardware and delay required to sum the partial products.
Booth’s Algorithm
Booth algorithm examines adjacent pairs of bits of the N-bit multiplier including an implicit bit below the least significant bit, y-1 = 0. These pairs are used to generate the partial products from the multiplicand by either multiplying it by 1 (i.e. no change), multiplying it by 2 (shift left by one bit), multiplying it by -1 (2’s complement) or multiplying it by -2(2's complement and shift left by one bit). The encodings are shown in Table 1. These partial products are shifted by two bits for each partial product after the first. The product is equal to the sum of these terms. This algorithm reduces the number of partial products from n to n/2.
21b x 21b Multiplier Design
EE7325 Page 2
Y2i+1 Y2i Y2i-1 Recoded Digit
0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 2 1 0 0 -2 1 0 1 -1 1 1 0 -1 1 1 1 0 Table 1: Booth Encoding
Architecture
In this project, we have designed a high speed, low power 21 bit x 21bit multiplier using Booth 2 algorithm. The 21bit multiplier is divided into 2 bits of 11 groupings. Each of these groupings is passed into a Booth encoder, whose output bits corresponding to the operations described in Table 1 .Each group of these selection bits are sent to a Booth decoder block. These decoded bits are then used to select the bits of the multiplicand using a partial product multiplexer (PPMUX) which outputs the appropriate partial product bits based on the selected operation. These partial products are then sign extended. Then the rows of partial products are compressed. The output compressed bits are then added using a low area carry propagate adder to output the final product. A standard array multiplier would typically require 21 partial products, however, this implementation reduces the number of partial products to only 11, significantly reducing the area and also improving the speed. The block diagram of the complete design is as shown in Figure 2.
Figure 1: 21bx21b multiplier
21b x 21b Multiplier Design
EE7325 Page 3
Figure 2: Complete Block Diagram of the Multiplier
Booth Encoder Y2i+1
Y2i
Y2i-1
PPMUX
1 2 0
-1 -2
X2i-1 X2i
Add Block
0 1
11:2 Compression
Adder
Result
21b x 21b Multiplier Design
EE7325 Page 4
Component Design
• Booth Encoder Booth encoding block is designed using Table 1. The recoded bits are generated using the corresponding input logic. The gate level schematic and the transistor level schematic for one of the five different recoded digits one are as shown below.
Y2i+1 Y2i Y2i-1 Recoded Digit
0 0 0 0 1 1 1 0
Figure 3: Gate level schematic for code 0
21b x 21b Multiplier Design
EE7325 Page 5
Figure 4: Schematic for code zero
21b x 21b Multiplier Design
EE7325 Page 6
Figure 5: Complete schematic of the Booth Encoder
Figure 6: Symbol view of the Booth encoder
21b x 21b Multiplier Design
EE7325 Page 7
• Partial Product Multiplexer(PPMUX) Pass transistor logic (PTL) is used to form the PPMUX. The encoded bits decide which bits of the multiplicand should be manipulated and then output the corresponding output bits for the partial product. The schematic of the Booth decoder is as shown in figure. Area is greatly reduced using PTL for the decoder.
Figure 7: Schematic of the Partial Product Multiplexer
Figure 8: Symbol of the Partial Product Multiplexer
21b x 21b Multiplier Design
EE7325 Page 8
• Add Block
If the recoded digit is negative, then we have to take the 2's complement of the multiplicand. 2’s complement is done by complementing all the bits and adding one to the LSB. The complement operation is done by the PPMUX. An add block is designed in order to add a one. This is also done using PTL and the recoded Booth digits and it is as shown below.
Figure 9: Schematic of the Add block
Figure 10: Symbol view of the Add block
21b x 21b Multiplier Design
EE7325 Page 9
• Compression Module In our design, we have 11 rows and 46 columns. We are using 3:2 compressors (i.e. Full adders) for compression. The idea here is to compress the partial product rows in each column into 2 rows which are then added to get the output product. The carries that are being generated in one column are dropped into the next column so that they are added to the sum of that column. This way the rippling of the carry is avoided. The compression here is mainly based on the number of inputs and carries that are being passed from the previous column. The 21st column has the maximum number of inputs which are 11. So the approximate number of full adders required is given by,
N = Xin + Cin –D
Where, Xin = Number of inputs to be compressed = 11 Cin = Carries passed from previous column= n-3 =8 D = Number of drops = 1 So N = Xin + Cin –D = 18 Number of full adders = !
! = 9 Full adders
The block diagram of the compression block is shown in Figure 11.
21b x 21b Multiplier Design
EE7325 Page 10
Figure 11: 11:2 Compression
Full Adder Full Adder Full Adder
in0 in1 in2 in3 in4 in5 in6 in7 in8
C1 C2
C3
Full Adder Full Adder
in9 in10
C1
Full Adder Full Adder
C2 C3
C4 C5
C4
C5
C6 C7
Full Adder C6 C8
Full Adder C7
C8
S
C
21b x 21b Multiplier Design
EE7325 Page 11
Figure 12: Partial products generated form inputs to the compression block
21b x 21b Multiplier Design
EE7325 Page 12
The full adder is designed using Mirror carry (MC) and Mirror Sum (MS) which has the least possible area for an adder where it has 12(MC) and 16(MS) transistor making a total of 28 transistor count with no diffusion breaks. The design is only about 5% slower than the NAND based sum but has much lesser area. The transistor level schematic of full adder is as shown in Figure 13.
Figure 13: Schematic of Full Adder
21b x 21b Multiplier Design
EE7325 Page 13
The half adder sum is an XOR of the two bits and the carry is an AND operation. The XOR is implemented using NOR2 + AOI21 combination and the AND is implemented as NAND2+INV. The schematic of the half adder is as shown in Figure 14.
Figure 14: Schematic of the Half Adder
21b x 21b Multiplier Design
EE7325 Page 14
• Carry Propagate Adder Design Since the main concern for the design is speed, a Carry Lookahead adder with two trees is used. The adder is faster than the ripple carry adder as it calculates one or more carry bits before the sum, which reduces the wait time to calculate the result of the larger value bits. However, it has more area as compared to ripple carry adder.
Figure 15: Schematic of the 43 bit Carry Lookahead Adder with two adders
21b x 21b Multiplier Design
EE7325 Page 15
The adder has a worst case delay of 372ps as shown in Figure 16.
Figure 16: Worst case delay of the CLA
21b x 21b Multiplier Design
EE7325 Page 16
Simulation Results
Figure 17 depicts the symbol view of the complete multiplier.
Figure 17: Symbol view of the complete multiplier
21b x 21b Multiplier Design
EE7325 Page 17
The functionality of the multiplier is tested using the following inputs among others
Case 1: X = Y = 1 1111 1111 1111 1111 1111
Expected Output: 0 1111 1111 1111 1111 1111 0000 0000 0000 0000 0000 01
Figure 18 shows the .mt0 file which depicts the obtained output
Figure 18: .mt0 file for Case 1
The obtained output is 0 1111 1111 1111 1111 1111 0000 0000 0000 0000 0000 01
LSB
MSB
21b x 21b Multiplier Design
EE7325 Page 18
Case 2: X = 0 1010 1010 1010 1010 1010
Y = 1 0101 0101 0101 0101 0101
Expected Output: 0 1110 0011 1000 1110 0010 0111 0001 1100 0111 0010
Figure 19 shows the .mt0 file which depicts the obtained output
Figure 19: .mt0 file for Case 2
The obtained output is 0 1110 0011 1000 1110 0010 0111 0001 1100 0111 0010
LSB
MSB
21b x 21b Multiplier Design
EE7325 Page 19
Prime time report
• Timing Report **************************************** Report : timing -path_type full -delay_type min_max -input_pins -max_paths 1 -transition_time -capacitance -sort_by slack Design : final_design Version: I-2013.12-SP3 Date : Sat Aug 9 20:41:11 2014 **************************************** Startpoint: I1/I1227/b (internal pin) Endpoint: m0 (output port) Path Group: (none) Path Type: min Point Cap Trans Incr Path ----------------------------------------------------------------------------- I1/I1227/b (xor) 0.00 0.00 0.00 f I1/I1227/out (xor) 15.00 8.60 7.88 7.88 f m0 (out) 8.60 0.00 7.88 f data arrival time 7.88 ----------------------------------------------------------------------------- (Path is unconstrained) Startpoint: I4/I53/I60/ximinusone (internal pin) Endpoint: m41 (output port) Path Group: (none) Path Type: max
21b x 21b Multiplier Design
EE7325 Page 20
Point Cap Trans Incr Path ----------------------------------------------------------------------------- I4/I53/I60/ximinusone (ppmux) 0.00 0.00 0.00 f I4/I53/I60/out (ppmux) 0.12 2.22 2.02 2.02 f I4/I1409/cin (fulladder) 2.22 0.00 2.02 f I4/I1409/cout (fulladder) 0.04 0.12 0.61 2.63 f I4/I1420/b (fulladder) 0.12 0.00 2.63 f I4/I1420/sum (fulladder) 0.03 0.05 0.24 2.87 f I4/I1422/cin (fulladder) 0.05 0.00 2.87 f I4/I1422/sum (fulladder) 0.03 0.05 0.24 3.11 f I4/I1423/cin (fulladder) 0.05 0.00 3.11 f I4/I1423/sum (fulladder) 0.04 0.05 0.24 3.35 f I4/I1424/b (fulladder) 0.05 0.00 3.35 f I4/I1424/sum (fulladder) 0.03 0.05 0.22 3.57 f I1/I888/a (xor) 0.05 0.00 3.57 f I1/I888/out (xor) 0.06 0.10 0.13 3.71 f I1/I1102/b (and) 0.10 0.00 3.71 f I1/I1102/out (and) 0.01 0.02 0.09 3.79 f I1/I1132/a (or) 0.02 0.00 3.79 f I1/I1132/out (or) 0.02 0.03 0.10 3.90 f I1/I1034/a (and) 0.03 0.00 3.90 f I1/I1034/out (and) 0.01 0.02 0.06 3.96 f I1/I1064/a (or) 0.02 0.00 3.96 f I1/I1064/out (or) 0.01 0.03 0.09 4.05 f I1/I1071/b (or) 0.03 0.00 4.05 f I1/I1071/out (or) 0.01 0.03 0.08 4.13 f I1/I1074/b (or) 0.03 0.00 4.13 f I1/I1074/out (or) 0.02 0.03 0.09 4.22 f I1/I1190/b (or) 0.03 0.00 4.22 f I1/I1190/out (or) 0.06 0.05 0.11 4.34 f I1/I1179/a (and) 0.05 0.00 4.34 f I1/I1179/out (and) 0.01 0.02 0.06 4.40 f I1/I1191/a (or) 0.02 0.00 4.40 f I1/I1191/out (or) 0.05 0.05 0.12 4.52 f I1/I1180/a (and) 0.05 0.00 4.52 f I1/I1180/out (and) 0.01 0.02 0.06 4.58 f I1/I1192/a (or) 0.02 0.00 4.58 f I1/I1192/out (or) 0.03 0.04 0.11 4.69 f I1/I1174/a (and) 0.04 0.00 4.69 f I1/I1174/out (and) 0.01 0.02 0.06 4.75 f
21b x 21b Multiplier Design
EE7325 Page 21
I1/I1196/a (or) 0.02 0.00 4.75 f I1/I1196/out (or) 0.02 0.03 0.10 4.85 f I1/I1267/a (xor) 0.03 0.00 4.85 f I1/I1267/out (xor) 15.00 29.06 20.97 25.82 r m41 (out) 29.06 0.00 25.82 r data arrival time 25.82 ----------------------------------------------------------------------------- (Path is unconstrained)
• Power Report **************************************** Report : Averaged Power Design : final_design Version: I-2013.12-SP3 Date : Sat Aug 9 20:41:12 2014 **************************************** Attributes ---------- i - Including register clock pin internal power u - User defined power group Internal Switching Leakage Total Power Group Power Power Power Power ( %) Attrs -------------------------------------------------------------------------------- clock_network 0.0000 0.0000 0.0000 0.0000 ( 0.00%) i register 0.0000 0.0000 0.0000 0.0000 ( 0.00%) combinational 0.0122 0.2191 9.688e-07 0.2312 (99.53%) sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%) memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%) io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%) black_box 5.952e-04 5.074e-04 2.286e-07 1.103e-03 ( 0.47%) Net Switching Power = 0.2196 (94.51%) Cell Internal Power = 0.0127 ( 5.49%) Cell Leakage Power = 1.197e-06 ( 0.00%) --------- Total Power = 0.2323 (100.00%)
21b x 21b Multiplier Design
EE7325 Page 22
Conclusion
A 21 bit * 21 bit unsigned multiplier is successfully designed and simulated. The output results are as shown.
Worst Case Delay 25.82 Total Power 0.2323mW