A 16×16 MUX Based Multiplier Design

Embed Size (px)

Citation preview

  • 8/3/2019 A 1616 MUX Based Multiplier Design

    1/9

    International Journal of Electronic Engineering Research

    Volume 1 Number 1 (2009) pp. 5361

    Research India Publications

    http://www.ripublication.com/ijeer.htm

    A 1616 MUX Based Multiplier Design UsingOptimized Static CMOS Logic Style

    Abhijit Asati* and Chandrashekhar**

    * Lecturer, Electrical & Electronics Engineering Group, BITS, Pilani, India

    ** Director, Central Electronics Engineering Research Institute, Pilani, India

    Abstract

    Simpler VLSI implementation of array multipliers makes them preferable for

    smaller operand sizes, in-spite of their linear time complexity. In general array

    multipliers have bad space complexity O (n2), and it requires approximately n

    2

    cells to produce multiplication, therefore as the operand size grows the circuit

    takes large area and power. In this paper we present a MUX based 1616unsigned multiplier circuit, which utilize an efficient partial product

    generation and partial product addition technique. The time and spacecomplexity of such multiplier is much better than simpler array multiplier

    techniques. The multiplier has been designed using optimized static CMOS

    logic cells to provide best area, power and delay performance. The multiplier

    circuit is implemented using conventional CMOS logic in 0.6m, N-wellCMOS process (SCN_SUBM, lambda=0.3) of MOSIS, and simulated after

    parasitic extraction. The simulation result shows large reduction in

    propagation delay and the average power compared to tree multiplier

    implementation by [3].

    Keywords: MUX based, array, Wallace tree, booth encoding, partial product,

    complexity, operand size,

    IntroductionIn Digital Signal Processor implementation like Standard Digital Signal Processors

    and ASIC Digital Signal Processors, the multiplier is used as fundamental building

    block. The performance of different signal processing algorithms like frequency

    domain filtering (FIR and IIR), frequency-time transformations (FFT), Correlation etc

    depend on performance of multiplier implementation. In most real-time DSP

    processing task, the multiplier block must operate at high speed, consuming less

    layout area and low Power. The multiplication algorithms differ in the means of

  • 8/3/2019 A 1616 MUX Based Multiplier Design

    2/9

    54 Abhijit Asati and Chandrashekhar

    partial product generation and partial product addition [1]. The array multipliers

    have linear time complexity i.e O (n) therefore their delay may degrade for multipliers

    having larger operand sizes. Also array multipliers have bad space complexity O (n2),

    and they requires approximately n2

    cells to produce multiplication, therefore as the

    operand size grows the circuit takes large area and power [2], [4], [5]. The reduction

    in partial product row by factor of n can be achieved using a radix-m booth encoding,

    (where m=2n). By using Booth radix-4 (m=4=2

    2) encoding the partial product rows

    can be halved [3]; therefore the number of logic cells required to generate partial

    product are reduced to n2 /2 [2]. Further in Wallace tree accumulation, since ripple

    effect is reduced it produces product in far less time, the time complexity is reduced to

    O (log n) but requires large gate and routing area compared to regular array, hence

    unsuitable for VLSI implementation [2]. The advantage of reduction in hardware

    using Booth encoding scheme can be combined with, accelerated Wallace treeaccumulation of partial product to obtain the reduced time complexity of O (log n),

    which are very much suitable for multipliers having large operand sizes [2], [3]. As

    discussed earlier, for smaller operand sizes the tree based architectures may have

    smaller gate delay but consume more silicon area due to increased routing and

    encoding overheads, on the other hand array multipliers have larger gate delay but

    consume smaller routing length. The MUX based array multipliers show faster and

    compact implementation due to efficient partial product generation and efficient

    partial product addition. In this paper we present, an implementation of 1616,multiplier design using MUX based array technique and static CMOS logic cells.

    These static CMOS logic cells provide best area, power and delay performance as

    described in [6]. The VLSI implementation of multiplier circuit is done using 0.6m,N-well CMOS process (SCN_SUBM, lambda=0.3) of MOSIS, using conventional

    CMOS logic. Simulation results are compared with another faster Booth encoded

    Wallace tree multiplier implementation as in [3]. Section II discusses the conventional

    static CMOS logic design style, section III explains the design of MUX based

    multiplier algorithm, Section IV describes the illustration of the Multiplication Logic;

    Section V describes schematic 44 multiplier and 1616 multiplier. Physicalimplementation and results are described in section VI. Section VII concludes the

    paper.

    Conventional static CMOS Logic Design styleA static logic gate generates its output corresponding to the applied input voltages

    after a certain time delay, and it can preserve its output level (or state) as long as the

    power supply is provided. In steady state each gate output is connected to either Vdd

    or Gnd through a low-resistive path therefore for a static input, the output levels are

    preserved, while the operation dynamic logic circuits relies on temporary storage of

    signal values on the capacitance of dynamic circuit nodes. Conventional static logic

    style offers a versatile implementation of logic functions based on static or steady

    state behavior of simple CMOS structures. It is most suitable and widely accepted for

    many VLSI circuit implementations due to its important properties like high speed,

    low power, large noise margins, no logic degradation and validity of logic design

  • 8/3/2019 A 1616 MUX Based Multiplier Design

    3/9

    A 1616 MUX Based Multiplier Design Using Optimized 55

    style at scaled down technologies. A logic gate with fan-in of n requires 2n (n N-

    type + n P-type) devices. Two logic blocks, N-block and P-block, form a CMOS

    gate. The topology of N-block is the dual of that of the P-block. Since both the two

    blocks have equal number of transistors, transistor count may increase. The channel

    widths of series connected n-channel MOS transistors (NMOS) or p-channel MOS

    transistors (PMOS) have to be increased to obtain a reasonable conducting current to

    drive capacitive loads. The increase in size of PMOS results in a significant area

    overhead, and also an increased gate input capacitance, which may lead to high

    dynamic power dissipation. The higher gate input capacitance loads the previous stage

    thereby increases the delay. The ratio of PMOS/NMOS transistor widths () shouldbe chosen optimally for achieving good, noise margin, higher speed and lower power

    consumption as described in [7], [9]. The short-circuit currents of a static CMOS gate

    can be minimized by appropriately sizing transistors for equal rise and fall times. Theschematic of 1-bit full adder, 2-input AND, 3-input AND, 2-input MUX, 2-input

    function implemented using Conventional Static CMOS Logic design is shown in

    Figure 1. The full adder cell is designed using principle of symmetry has 28

    transistors as described in [6], [8]. The 28-transistor performs considerably better than

    the 40-transistors version [6]. The 32-bit adder designed using complimentary CMOS

    has a power delay product of less than half of the CPL version [6]. The 2-input AND

    cell, 3-input AND cell, 2 input MUX and other cells also provide better a power delay

    product.

    (a)

    (b) (c) (d)

    Figure 1: schematic using conventional static CMOS logic design style of (a)complex Full adder cell using principle of symmetry (b) 2 input AND gate (c) 3 input

    AND gate (d) 2 input MUX .

  • 8/3/2019 A 1616 MUX Based Multiplier Design

    4/9

    56 Abhijit Asati and Chandrashekhar

    MUX based multiplier algorithmIt is unsigned multiplier algorithm in which one bit of the multiplier and one bit of the

    multiplicand are processed in parallel. The algorithm is symmetric, i.e., the multiplier

    and multiplicand can be interchanged. According to this algorithm, the sum of the two

    operands, progressively computed, is a useful quantity that is used in the computation

    of certain partial products. The different quantities are computed one bit at each step

    of the algorithm and the appropriate quantity is then selected in the next step, if

    required so. The parallel implementation of this algorithm yields an iterative type

    array. Compared to the implementation based on the modified booths algorithm, it

    consumes the same amount of circuitry but yields faster multiplication. This

    multiplexer-based architecture performs parallel computation of the partial sums of

    the two operands together, which simplifies the tasks such as compression and

    accumulation. It also performs favorably well with regards to processing speed,compared to other regular array architectures. The multiplication logic can be

    explained using equation 1, equation 2, equation 3, equation 4 and equation 5.

    XYPLet

    yyyY

    xxxX

    nn

    nn

    =

    =

    =

    ,

    )1(021

    021

    K

    K

    Xj &Yj are binary nos. after truncation, up-to the (j+1)th

    bit in X,Y respectively;

    )2(10,021

    021

    +

  • 8/3/2019 A 1616 MUX Based Multiplier Design

    5/9

    A 1616 MUX Based Multiplier Design Using Optimized 57

    )4(1,1

    0,00

    0,1

    1,0

    ,

    ==+=

    ===

    ===

    ===+=

    jjjjj

    jjj

    jjjj

    jjjj

    jjjjj

    yxifYXZ

    yxifZ

    yxifYZ

    yxifXZyXYxZ

    where

    Illustration of the Multiplication LogicThe example 1 shows the multiplication process for two binary 4-bit numbers using

    MUX-based approach. The multiplication process shows that the numbers of rowsremain the same, but numbers of partial product bits to be compressed in a particular

    column are now restricted to only 3-bits; this makes compression much faster and

    easier. If carry bits C1, C2, C3 as shown by example 1 are taken care then the

    number of bits to be added in particular column will be only 2-bits. The two columns

    can be added simultaneously using 2 bit CLA, which also accepts carry input C1, C2,

    C3 of particular column (this is possible because, these carries are occurring in

    alternate columns). Thus the first step in algorithm is generation of partial product

    rows and second step performs the addition of these partial products together with

    compression. Thus compared to other regular array multiplier it will be faster. It

    produces output in time T= (n+1)FA_2CLA

    whereFA_2CLA

    is delay of a 2 bit CLA

    adder, with a timing overhead one 4:1 MUX delay, while regular array multiplier

    takes approximate delay of T= (2n) FA. The large area overhead will be due torouting needed between these MUX.

    Example 1: X0Y0, X1Y1, X2Y2 & X3Y3 at the positions shown below has be added with

    appropriate term selected by 4:1 MUX based on select lines shown in first column.Let X= X3X2X1X0=0111=(+7)10 and Y= Y3Y2Y1Y0=0011=(+3)10

    The uncolored portion explains the operation to be performed by algorithm and colored

    portion show the application of algorithm on selected inputs X and Y.

    Working of MUX: Select lines 00/01/10/11 corresponds to I1/I2/I3/I4.

    Select

    line for4:1

    MUX

    X3Y3 X2Y2 X1Y1 X0Y00 0 1 1

    0/0/0/C1=0/0/0/1

    0/X0/Y0/S0=0/1/1/0

    X1Y1=11

    1 0

    0/0/0/C2=0/0/0/1

    0/X1/Y1/S1=0/1/1/1

    0/X0/Y0/S0=0/1/1/0

    X2Y2=10

    0 1 1

    0/0/0/C3=0/0/0/1

    0/X2/Y2/S2=0/1/0/0

    0/X1/Y1/S1=0/1/1/1

    0/X0/Y0/S0=0/1/1/0

    X3Y3=00

    0 0 0 0

    0 0 0 1 0 1 0 1 =(21)10

    P7 P6 P5 P4 P3 P2 P1 P0

  • 8/3/2019 A 1616 MUX Based Multiplier Design

    6/9

    58 Abhijit Asati and Chandrashekhar

    Schematic 44 multiplier and 1616 multiplierThe logic explained in example 1 can be shown through a schematic, which use 4:1Multiplexers & AND gates as shown in figure 2. The multiplexers are used to choose

    the Zj for the Zj2jterms (refer equation 5) while AND gates are used to produce the

    xjyj22j

    terms. The logic for MUX based multiplier implementation is shown in Figure

    2. The complete logic structure to accumulate the partial product terms utilizes Cell-I

    and Cell-II, which are shown in Figure 3 [2]. Similar technique can be used in design

    of 1616 multiplier.

    4:1MUX

    X1

    Y1

    0 X0 Y0 S0

    4:1MUX

    0 0 0 C3

    4:1MUX

    X2

    Y2

    0 X0 Y0 S0

    4:1MUX

    0 Y1 X1 S1

    4:1MUX

    X1Y1

    0 X2 Y2 S2

    4:1MUX

    X3Y3

    0 X0 Y0 S0

    4:1MUX

    0 Y1 X1 S1

    4:1MUX

    0 0 0 C2

    4:1MUX

    0 0 0 C1

    AND2

    X0 Y0

    20X0Y0

    AND2

    X1 Y1

    22X1Y1

    AND2

    X2 Y2

    24X2Y2

    AND2

    X3 Y3

    26X3Y3

    Z121

    Z222

    Z323

    Figure 2: logic for MUX based multiplier implementation.

    Xj Yj

    4:1

    MUXFA

    CinSin

    Cout

    Xi=XjYi=YjSi=Sj

    CELL-I

    XiYi

    SiXjYj

    Sout

    0 I

    Xj Yj CinSin

    Xj Yj Cout Sout

    Xi=Xj

    Yi=YjSi=Sj

  • 8/3/2019 A 1616 MUX Based Multiplier Design

    7/9

    A 1616 MUX Based Multiplier Design Using Optimized 59

    Xj Yj

    AND2

    FA

    FAAND2

    Sout

    Sin

    Cout

    Xj

    Yj

    Cj

    Yi=Yj

    Xi=Xj

    Si=Sj

    Cj+1

    CELL-II

    Cin

    XiYi

    II

    Xj Yj Cin

    C

    Sin

    Xi=Xj

    Yi=YjSi=Sj

    XiYiSoutCoutCj+1

    Figure 3: Cell-I and Cell-II used in MUX-based multiplier implementation.

    Figure 4: Photomicrograph of a 1616 MUX based multiplier.

    Physical implementation and ResultLayout for a 1616 MUX based, unsigned multiplier circuit shown in figure (4) is

    implemented in 0.6m, N-well CMOS process (SCN_SUBM, lambda=0.3) ofMOSIS, using conventional CMOS logic. A schematic library consisting of 7

    functional cells is defined for static CMOS design styles comprising of 1-bit full

    adder, 2-input AND, 3-input AND, 2-input MUX, 2-input XOR, 2-input OR and 3-

    input OR function. Corresponding to the schematic library, physical libraries were

    designed using conventional CMOS logic design styles using the design principles of

    [7], [8], [9], [10]. Three different versions of each physical library were developed by

    respectively sizing the W/L ratios of the NMOS transistor to values of 3,5 and 7 (W/L

    values smaller than 3 were also experimented with but not considered further as they

    resulted in parasitic dominated slower speeds due to weak drives of transistors and

    were not considered good candidates for high performance. The layout assemblies for

  • 8/3/2019 A 1616 MUX Based Multiplier Design

    8/9

    60 Abhijit Asati and Chandrashekhar

    the 16-bit multiplier were carried out using these cell libraries and automatic place

    and route tool LEDIT (SPR) from M/s Tanner Research Inc. It was noticed that the

    physical library utilizing W/L ratio of 3 for NMOS transistor gave the smallest

    average switching energy-delay product.

    The generated layouts were simulated after parasitic extraction using circuit

    simulator, ELDO spice. Supply voltage VDD is kept at 3.3V. The table 1 shows the

    comparison of important parameters like propagation delay and power dissipation at

    20MHz data rate with tree based implementation as in [3]. Table 2 shows the

    maximum power leakage power, transistor count, core area, total routing length and

    number of vias.

    Table 1

    Algorithm

    (technology)

    VDD(V)

    Propagation

    delay () nsAverage

    power (mW)

    Proposed

    (0.6m)

    3.3 14.15 22.05

    BEWM

    ref [3]

    (1.25 m)

    5 60 100

    Table 2

    Algorithm

    (technology)

    Maximum

    Power

    (mW)

    Leakage

    Power

    (nW)

    Transistor

    count

    Core

    area

    (mm2)

    Total

    routing

    length

    (mm)

    Number

    of Via

    Proposed

    (0.6m)

    623.46 53.34 10168 23.76 1386.71 3452

    Comparing these two multiplier architectures shows that proposed MUX based

    array multiplier architecture shows reduction in delay by a factor of 0.235 and

    reduction in average power consumption almost by a factor of 0.22. The maximum

    instantaneous power, leakage power, transistor count, core area, total routing length

    and number of vias are also shown for judging the VLSI implementation

    characteristics.

  • 8/3/2019 A 1616 MUX Based Multiplier Design

    9/9

    A 1616 MUX Based Multiplier Design Using Optimized 61

    ConclusionThis paper present a 16-bit MUX based unsigned multiplier implementation using an

    optimized static CMOS logic style. The multiplier algorithm performs efficient partial

    product generation and addition; which makes its time and space complexity better

    than other array multipliers. The simulation results are compared with faster tree

    multiplier implementation shows reduction in propagation delay by a factor 1/4 and

    average switching power by approximately by a factor 1/4.

    References

    [1] A. Hesham, Technology scaling effects on multipliers, IEEE Transactions

    on Computers, Vol.47, No.11, pp. 1201-1215, November 1998.[2] Z. Kiamal, Multiplexer-based array multipliers, IEEE Transactions on

    Computers, Vol.48, No.1, pp. 15-23, January 1999.

    [3] F Jalil, M *N Booth encoded multiplier generator using optimized wallace

    trees, IEEE Transactions on very large Scale Integration (VLSI) Systems,

    Vol. 1, No.2, pp. 120-125, June 1993.

    [4] V. Chanramouli, Self-Timed design in GaAs-case study on a high-speed,

    parallel multiplier, IEEE Transactions on very large Scale Integration (VLSI)

    Systems, Vol. 4, No.1, pp. 146-149, March 1996.

    [5] P. Kornerup, A systolic, linear-array multiplier for a class of right-shift

    algorithms, IEEE Transactions on Computers, Vol.43, No.8, pp. 892-898,

    August 1994.[6] Reto Zimmermann and Wolfgang Fichtner, Low-Power Logic Styles: CMOS

    Versus Pass Transisistor Logic IEEE Journal of solid state circuits, Vol. 32,

    No. 7, pp. 1079-1090, July 1997

    [7] Mohab Anis, Mohamed Allam and Mohamed Elmasry, Impact of

    Technology Scaling on CMOS Logic Styles, IEEE Transaction on circuits

    and systems-II, Analog and Digital Signal Processing, VOL. 49, NO. 8, pp.

    577-587, August 2002.

    [8] S.M. kang, Yusuf Leblebici, CMOS Digital integrated Circuits, Analysis and

    Design, Third edition McGrawhill, 2003.

    [9] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, Addison-

    Wesley, 1994

    [10] Jan M. Rabaey, Anantha Chandrakasan, Borivose Nikolic, Digital Integrated

    Circuits, Second Edition PrenticeHall of India Private Limited, 2004.