8/3/2019 A 1616 MUX Based Multiplier Design
1/9
International Journal of Electronic Engineering Research
Volume 1 Number 1 (2009) pp. 5361
Research India Publications
http://www.ripublication.com/ijeer.htm
A 1616 MUX Based Multiplier Design UsingOptimized Static CMOS Logic Style
Abhijit Asati* and Chandrashekhar**
* Lecturer, Electrical & Electronics Engineering Group, BITS, Pilani, India
** Director, Central Electronics Engineering Research Institute, Pilani, India
Abstract
Simpler VLSI implementation of array multipliers makes them preferable for
smaller operand sizes, in-spite of their linear time complexity. In general array
multipliers have bad space complexity O (n2), and it requires approximately n
2
cells to produce multiplication, therefore as the operand size grows the circuit
takes large area and power. In this paper we present a MUX based 1616unsigned multiplier circuit, which utilize an efficient partial product
generation and partial product addition technique. The time and spacecomplexity of such multiplier is much better than simpler array multiplier
techniques. The multiplier has been designed using optimized static CMOS
logic cells to provide best area, power and delay performance. The multiplier
circuit is implemented using conventional CMOS logic in 0.6m, N-wellCMOS process (SCN_SUBM, lambda=0.3) of MOSIS, and simulated after
parasitic extraction. The simulation result shows large reduction in
propagation delay and the average power compared to tree multiplier
implementation by [3].
Keywords: MUX based, array, Wallace tree, booth encoding, partial product,
complexity, operand size,
IntroductionIn Digital Signal Processor implementation like Standard Digital Signal Processors
and ASIC Digital Signal Processors, the multiplier is used as fundamental building
block. The performance of different signal processing algorithms like frequency
domain filtering (FIR and IIR), frequency-time transformations (FFT), Correlation etc
depend on performance of multiplier implementation. In most real-time DSP
processing task, the multiplier block must operate at high speed, consuming less
layout area and low Power. The multiplication algorithms differ in the means of
8/3/2019 A 1616 MUX Based Multiplier Design
2/9
54 Abhijit Asati and Chandrashekhar
partial product generation and partial product addition [1]. The array multipliers
have linear time complexity i.e O (n) therefore their delay may degrade for multipliers
having larger operand sizes. Also array multipliers have bad space complexity O (n2),
and they requires approximately n2
cells to produce multiplication, therefore as the
operand size grows the circuit takes large area and power [2], [4], [5]. The reduction
in partial product row by factor of n can be achieved using a radix-m booth encoding,
(where m=2n). By using Booth radix-4 (m=4=2
2) encoding the partial product rows
can be halved [3]; therefore the number of logic cells required to generate partial
product are reduced to n2 /2 [2]. Further in Wallace tree accumulation, since ripple
effect is reduced it produces product in far less time, the time complexity is reduced to
O (log n) but requires large gate and routing area compared to regular array, hence
unsuitable for VLSI implementation [2]. The advantage of reduction in hardware
using Booth encoding scheme can be combined with, accelerated Wallace treeaccumulation of partial product to obtain the reduced time complexity of O (log n),
which are very much suitable for multipliers having large operand sizes [2], [3]. As
discussed earlier, for smaller operand sizes the tree based architectures may have
smaller gate delay but consume more silicon area due to increased routing and
encoding overheads, on the other hand array multipliers have larger gate delay but
consume smaller routing length. The MUX based array multipliers show faster and
compact implementation due to efficient partial product generation and efficient
partial product addition. In this paper we present, an implementation of 1616,multiplier design using MUX based array technique and static CMOS logic cells.
These static CMOS logic cells provide best area, power and delay performance as
described in [6]. The VLSI implementation of multiplier circuit is done using 0.6m,N-well CMOS process (SCN_SUBM, lambda=0.3) of MOSIS, using conventional
CMOS logic. Simulation results are compared with another faster Booth encoded
Wallace tree multiplier implementation as in [3]. Section II discusses the conventional
static CMOS logic design style, section III explains the design of MUX based
multiplier algorithm, Section IV describes the illustration of the Multiplication Logic;
Section V describes schematic 44 multiplier and 1616 multiplier. Physicalimplementation and results are described in section VI. Section VII concludes the
paper.
Conventional static CMOS Logic Design styleA static logic gate generates its output corresponding to the applied input voltages
after a certain time delay, and it can preserve its output level (or state) as long as the
power supply is provided. In steady state each gate output is connected to either Vdd
or Gnd through a low-resistive path therefore for a static input, the output levels are
preserved, while the operation dynamic logic circuits relies on temporary storage of
signal values on the capacitance of dynamic circuit nodes. Conventional static logic
style offers a versatile implementation of logic functions based on static or steady
state behavior of simple CMOS structures. It is most suitable and widely accepted for
many VLSI circuit implementations due to its important properties like high speed,
low power, large noise margins, no logic degradation and validity of logic design
8/3/2019 A 1616 MUX Based Multiplier Design
3/9
A 1616 MUX Based Multiplier Design Using Optimized 55
style at scaled down technologies. A logic gate with fan-in of n requires 2n (n N-
type + n P-type) devices. Two logic blocks, N-block and P-block, form a CMOS
gate. The topology of N-block is the dual of that of the P-block. Since both the two
blocks have equal number of transistors, transistor count may increase. The channel
widths of series connected n-channel MOS transistors (NMOS) or p-channel MOS
transistors (PMOS) have to be increased to obtain a reasonable conducting current to
drive capacitive loads. The increase in size of PMOS results in a significant area
overhead, and also an increased gate input capacitance, which may lead to high
dynamic power dissipation. The higher gate input capacitance loads the previous stage
thereby increases the delay. The ratio of PMOS/NMOS transistor widths () shouldbe chosen optimally for achieving good, noise margin, higher speed and lower power
consumption as described in [7], [9]. The short-circuit currents of a static CMOS gate
can be minimized by appropriately sizing transistors for equal rise and fall times. Theschematic of 1-bit full adder, 2-input AND, 3-input AND, 2-input MUX, 2-input
function implemented using Conventional Static CMOS Logic design is shown in
Figure 1. The full adder cell is designed using principle of symmetry has 28
transistors as described in [6], [8]. The 28-transistor performs considerably better than
the 40-transistors version [6]. The 32-bit adder designed using complimentary CMOS
has a power delay product of less than half of the CPL version [6]. The 2-input AND
cell, 3-input AND cell, 2 input MUX and other cells also provide better a power delay
product.
(a)
(b) (c) (d)
Figure 1: schematic using conventional static CMOS logic design style of (a)complex Full adder cell using principle of symmetry (b) 2 input AND gate (c) 3 input
AND gate (d) 2 input MUX .
8/3/2019 A 1616 MUX Based Multiplier Design
4/9
56 Abhijit Asati and Chandrashekhar
MUX based multiplier algorithmIt is unsigned multiplier algorithm in which one bit of the multiplier and one bit of the
multiplicand are processed in parallel. The algorithm is symmetric, i.e., the multiplier
and multiplicand can be interchanged. According to this algorithm, the sum of the two
operands, progressively computed, is a useful quantity that is used in the computation
of certain partial products. The different quantities are computed one bit at each step
of the algorithm and the appropriate quantity is then selected in the next step, if
required so. The parallel implementation of this algorithm yields an iterative type
array. Compared to the implementation based on the modified booths algorithm, it
consumes the same amount of circuitry but yields faster multiplication. This
multiplexer-based architecture performs parallel computation of the partial sums of
the two operands together, which simplifies the tasks such as compression and
accumulation. It also performs favorably well with regards to processing speed,compared to other regular array architectures. The multiplication logic can be
explained using equation 1, equation 2, equation 3, equation 4 and equation 5.
XYPLet
yyyY
xxxX
nn
nn
=
=
=
,
)1(021
021
K
K
Xj &Yj are binary nos. after truncation, up-to the (j+1)th
bit in X,Y respectively;
)2(10,021
021
+
8/3/2019 A 1616 MUX Based Multiplier Design
5/9
A 1616 MUX Based Multiplier Design Using Optimized 57
)4(1,1
0,00
0,1
1,0
,
==+=
===
===
===+=
jjjjj
jjj
jjjj
jjjj
jjjjj
yxifYXZ
yxifZ
yxifYZ
yxifXZyXYxZ
where
Illustration of the Multiplication LogicThe example 1 shows the multiplication process for two binary 4-bit numbers using
MUX-based approach. The multiplication process shows that the numbers of rowsremain the same, but numbers of partial product bits to be compressed in a particular
column are now restricted to only 3-bits; this makes compression much faster and
easier. If carry bits C1, C2, C3 as shown by example 1 are taken care then the
number of bits to be added in particular column will be only 2-bits. The two columns
can be added simultaneously using 2 bit CLA, which also accepts carry input C1, C2,
C3 of particular column (this is possible because, these carries are occurring in
alternate columns). Thus the first step in algorithm is generation of partial product
rows and second step performs the addition of these partial products together with
compression. Thus compared to other regular array multiplier it will be faster. It
produces output in time T= (n+1)FA_2CLA
whereFA_2CLA
is delay of a 2 bit CLA
adder, with a timing overhead one 4:1 MUX delay, while regular array multiplier
takes approximate delay of T= (2n) FA. The large area overhead will be due torouting needed between these MUX.
Example 1: X0Y0, X1Y1, X2Y2 & X3Y3 at the positions shown below has be added with
appropriate term selected by 4:1 MUX based on select lines shown in first column.Let X= X3X2X1X0=0111=(+7)10 and Y= Y3Y2Y1Y0=0011=(+3)10
The uncolored portion explains the operation to be performed by algorithm and colored
portion show the application of algorithm on selected inputs X and Y.
Working of MUX: Select lines 00/01/10/11 corresponds to I1/I2/I3/I4.
Select
line for4:1
MUX
X3Y3 X2Y2 X1Y1 X0Y00 0 1 1
0/0/0/C1=0/0/0/1
0/X0/Y0/S0=0/1/1/0
X1Y1=11
1 0
0/0/0/C2=0/0/0/1
0/X1/Y1/S1=0/1/1/1
0/X0/Y0/S0=0/1/1/0
X2Y2=10
0 1 1
0/0/0/C3=0/0/0/1
0/X2/Y2/S2=0/1/0/0
0/X1/Y1/S1=0/1/1/1
0/X0/Y0/S0=0/1/1/0
X3Y3=00
0 0 0 0
0 0 0 1 0 1 0 1 =(21)10
P7 P6 P5 P4 P3 P2 P1 P0
8/3/2019 A 1616 MUX Based Multiplier Design
6/9
58 Abhijit Asati and Chandrashekhar
Schematic 44 multiplier and 1616 multiplierThe logic explained in example 1 can be shown through a schematic, which use 4:1Multiplexers & AND gates as shown in figure 2. The multiplexers are used to choose
the Zj for the Zj2jterms (refer equation 5) while AND gates are used to produce the
xjyj22j
terms. The logic for MUX based multiplier implementation is shown in Figure
2. The complete logic structure to accumulate the partial product terms utilizes Cell-I
and Cell-II, which are shown in Figure 3 [2]. Similar technique can be used in design
of 1616 multiplier.
4:1MUX
X1
Y1
0 X0 Y0 S0
4:1MUX
0 0 0 C3
4:1MUX
X2
Y2
0 X0 Y0 S0
4:1MUX
0 Y1 X1 S1
4:1MUX
X1Y1
0 X2 Y2 S2
4:1MUX
X3Y3
0 X0 Y0 S0
4:1MUX
0 Y1 X1 S1
4:1MUX
0 0 0 C2
4:1MUX
0 0 0 C1
AND2
X0 Y0
20X0Y0
AND2
X1 Y1
22X1Y1
AND2
X2 Y2
24X2Y2
AND2
X3 Y3
26X3Y3
Z121
Z222
Z323
Figure 2: logic for MUX based multiplier implementation.
Xj Yj
4:1
MUXFA
CinSin
Cout
Xi=XjYi=YjSi=Sj
CELL-I
XiYi
SiXjYj
Sout
0 I
Xj Yj CinSin
Xj Yj Cout Sout
Xi=Xj
Yi=YjSi=Sj
8/3/2019 A 1616 MUX Based Multiplier Design
7/9
A 1616 MUX Based Multiplier Design Using Optimized 59
Xj Yj
AND2
FA
FAAND2
Sout
Sin
Cout
Xj
Yj
Cj
Yi=Yj
Xi=Xj
Si=Sj
Cj+1
CELL-II
Cin
XiYi
II
Xj Yj Cin
C
Sin
Xi=Xj
Yi=YjSi=Sj
XiYiSoutCoutCj+1
Figure 3: Cell-I and Cell-II used in MUX-based multiplier implementation.
Figure 4: Photomicrograph of a 1616 MUX based multiplier.
Physical implementation and ResultLayout for a 1616 MUX based, unsigned multiplier circuit shown in figure (4) is
implemented in 0.6m, N-well CMOS process (SCN_SUBM, lambda=0.3) ofMOSIS, using conventional CMOS logic. A schematic library consisting of 7
functional cells is defined for static CMOS design styles comprising of 1-bit full
adder, 2-input AND, 3-input AND, 2-input MUX, 2-input XOR, 2-input OR and 3-
input OR function. Corresponding to the schematic library, physical libraries were
designed using conventional CMOS logic design styles using the design principles of
[7], [8], [9], [10]. Three different versions of each physical library were developed by
respectively sizing the W/L ratios of the NMOS transistor to values of 3,5 and 7 (W/L
values smaller than 3 were also experimented with but not considered further as they
resulted in parasitic dominated slower speeds due to weak drives of transistors and
were not considered good candidates for high performance. The layout assemblies for
8/3/2019 A 1616 MUX Based Multiplier Design
8/9
60 Abhijit Asati and Chandrashekhar
the 16-bit multiplier were carried out using these cell libraries and automatic place
and route tool LEDIT (SPR) from M/s Tanner Research Inc. It was noticed that the
physical library utilizing W/L ratio of 3 for NMOS transistor gave the smallest
average switching energy-delay product.
The generated layouts were simulated after parasitic extraction using circuit
simulator, ELDO spice. Supply voltage VDD is kept at 3.3V. The table 1 shows the
comparison of important parameters like propagation delay and power dissipation at
20MHz data rate with tree based implementation as in [3]. Table 2 shows the
maximum power leakage power, transistor count, core area, total routing length and
number of vias.
Table 1
Algorithm
(technology)
VDD(V)
Propagation
delay () nsAverage
power (mW)
Proposed
(0.6m)
3.3 14.15 22.05
BEWM
ref [3]
(1.25 m)
5 60 100
Table 2
Algorithm
(technology)
Maximum
Power
(mW)
Leakage
Power
(nW)
Transistor
count
Core
area
(mm2)
Total
routing
length
(mm)
Number
of Via
Proposed
(0.6m)
623.46 53.34 10168 23.76 1386.71 3452
Comparing these two multiplier architectures shows that proposed MUX based
array multiplier architecture shows reduction in delay by a factor of 0.235 and
reduction in average power consumption almost by a factor of 0.22. The maximum
instantaneous power, leakage power, transistor count, core area, total routing length
and number of vias are also shown for judging the VLSI implementation
characteristics.
8/3/2019 A 1616 MUX Based Multiplier Design
9/9
A 1616 MUX Based Multiplier Design Using Optimized 61
ConclusionThis paper present a 16-bit MUX based unsigned multiplier implementation using an
optimized static CMOS logic style. The multiplier algorithm performs efficient partial
product generation and addition; which makes its time and space complexity better
than other array multipliers. The simulation results are compared with faster tree
multiplier implementation shows reduction in propagation delay by a factor 1/4 and
average switching power by approximately by a factor 1/4.
References
[1] A. Hesham, Technology scaling effects on multipliers, IEEE Transactions
on Computers, Vol.47, No.11, pp. 1201-1215, November 1998.[2] Z. Kiamal, Multiplexer-based array multipliers, IEEE Transactions on
Computers, Vol.48, No.1, pp. 15-23, January 1999.
[3] F Jalil, M *N Booth encoded multiplier generator using optimized wallace
trees, IEEE Transactions on very large Scale Integration (VLSI) Systems,
Vol. 1, No.2, pp. 120-125, June 1993.
[4] V. Chanramouli, Self-Timed design in GaAs-case study on a high-speed,
parallel multiplier, IEEE Transactions on very large Scale Integration (VLSI)
Systems, Vol. 4, No.1, pp. 146-149, March 1996.
[5] P. Kornerup, A systolic, linear-array multiplier for a class of right-shift
algorithms, IEEE Transactions on Computers, Vol.43, No.8, pp. 892-898,
August 1994.[6] Reto Zimmermann and Wolfgang Fichtner, Low-Power Logic Styles: CMOS
Versus Pass Transisistor Logic IEEE Journal of solid state circuits, Vol. 32,
No. 7, pp. 1079-1090, July 1997
[7] Mohab Anis, Mohamed Allam and Mohamed Elmasry, Impact of
Technology Scaling on CMOS Logic Styles, IEEE Transaction on circuits
and systems-II, Analog and Digital Signal Processing, VOL. 49, NO. 8, pp.
577-587, August 2002.
[8] S.M. kang, Yusuf Leblebici, CMOS Digital integrated Circuits, Analysis and
Design, Third edition McGrawhill, 2003.
[9] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, Addison-
Wesley, 1994
[10] Jan M. Rabaey, Anantha Chandrakasan, Borivose Nikolic, Digital Integrated
Circuits, Second Edition PrenticeHall of India Private Limited, 2004.