Muk 2 by 2

Embed Size (px)

Citation preview

  • 8/10/2019 Muk 2 by 2

    1/23

    PROPOSED SYSTEM

    3.1 2X2 VEDIC MULTIP

    The method ex

    A=a1a0 and B=b1b0 as sho

    are multiplied which gives

    multiplicand is multiplied

    the product of LSB of the

    sum gives second bit of t

    product obtained by multipl

    The sum is the third corresp

    product.

    s0= a0b0; (1)

    c1s1= a1b0+a0b1;

    c2s2= c1+a1b1; (3

    The final result will be c2

    cases.

    Figure 3.1 The Vedic Mu

    18

    CHAPTER 3

    LIER

    lained below for two, 2 bit numbers

    n in figure 3.1. Firstly, the least signifi

    the LSB of the final product. Then, t

    ith the next higher bit of the multiplier a

    ultiplier and next higher bit of the mu

    e final product and carry is added

    ing the most significant bits to give the

    onding bit and carry becomes the fourth

    (2)

    )

    2s1s0.This multiplication method is ap

    ltiplication Method for two 2-bit binary

    2X2 bit

    and B where

    ant bits (LSB)

    e LSB of the

    d added with,

    tiplicand. The

    ith the partial

    sum and carry.

    bit of the final

    licable for all

    numbers for

  • 8/10/2019 Muk 2 by 2

    2/23

    The 2X2 Vedi

    input AND gates and two

    Figure 3.2. The same met

    multiplier is based on Urd

    Figure

    3.1.1 HARDWARE REA

    The hardware

    Figure 3.3. For the sake o

    but emphasis has been laid

    Figure

    19

    c multiplier(VM) module is implemen

    alf-adders which is displayed in its bl

    od can be extended number of input b

    va-tiryakbyham Sutra.

    3.2 Block Diagram of 2X2 Vedic Multi

    IZATION OF 2X2 MULTIPLIER B

    realization of 2 X 2 multiplier block i

    simplicity, usage of clock and register

    n understanding of the algorithm.

    3.3 Hardware realization of 2 X 2 block

    ed using four

    ck diagram in

    ts. The Vedic

    lier

    OCK

    illustrated in

    is not shown,

  • 8/10/2019 Muk 2 by 2

    3/23

    20

    3.1.2 EXAMPLE OF 2X2 VEDIC MULTIPLICATION

    Example of decimal and binary Vedic multiplication of 2X2 bit is

    shown below in Figure 3.4 and 3.5.

    Figure 3.4 Vedic multiplication of decimal numbers

    Figure 3.5 Vedic multiplication of binary numbers

  • 8/10/2019 Muk 2 by 2

    4/23

    21

    3.1.3 ALGORITHM

    Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most

    significant bits(MSB) .

    X1X0

    * Y1Y0

    FEDC

    STEP 1:CP1=X0*Y0=C1C0

    STEP 2;C=C0

    STEP 3:CP2=X1*Y0+Y1*X0=D1D0

    STEP 4:D=D0+C1

    STEP 5:CP3=X1*Y1=E1E0

    STEP 6:E=E0+D1

    STEP 7:F=E1

    Where CP=cross product

    X=multiplicand Y=multiplier

    3.2 VEDIC MULTIPLIER FOR 4X4 BIT

    Block diagram of 4X4 bit Vedic multiplier is shown in figure 3.6. To

    get the final product, 4 two bit Vedic multipliers are used and three 4 bit ripple

    carry adders are required. In this proposal, the first 4-bit RC adder is used to add

    two 4-bit operands obtained from cross multiplication of the two middle 2X2 bit

    multiplier modules. The second 4-bit RC adder is used to add two 4-bit operands.

    i.e. concatenated 4-bit(00 & most significant two output bits of right hand most

    of 2X2 multiplier module) and one 4-bit operand we get as the output sum of hand

  • 8/10/2019 Muk 2 by 2

    5/23

    most of 2X2 multiplier m

    speaks about Vedic multipli

    bit multiplier is constructed

    constructed using four 8 bit

    Figure

    Here, insteadmodified to Wallace tree l

    instead of 3. Here, two low

    of q0 are fed into addition

    illustrated by the diagram in

    Figure 3.7

    22

    dule. Early literature speaks about Ve

    ers based on array multiplier structures.

    using four 4 bit multipliers and 16X16

    multipliers and so on.

    3.6 Block diagram of 4X4 Vedic Multi

    f following serial addition, the additioook alike, thus reducing the levels of

    r bits of q0 pass directly to output, whil

    tree. The bits being fed to addition tree

    figure 3.7.

    ddition of partial products in 4 x 4 block

    ic multipliers

    imilarly, 8X8

    it multiplier is

    lier

    tree has beenaddition to 2,

    the upper bits

    can be further

  • 8/10/2019 Muk 2 by 2

    6/23

    23

    3.2.1 Algorithm for 4 x 4 bit Vedic multiplier Using Urdhva Tiryakbhyam

    (Vertically and crosswise) for two Binary numbers

    CP = Cross Product (Vertically and Crosswise)

    X3 X2 X1 X0 Multiplicand

    Y3 Y2 Y1 Y0 Multiplier

    ------------------------------------------------------------------

    H G F E D C B A

    ----------------------------------------------------------------

    P7 P6 P5 P4 P3 P2 P1 P0 Product

    --------------------------------------------------------------------

    PARALLEL COMPUTATION METHODOLOGY

    1. CP X0 = X0 * Y0 = A

    Y0

    2. CP X1 X0 = X1 * Y0+X0 * Y1 = B

    Y1 Y0

    3. CP X2 X1 X0 = X2 * Y0 +X0 * Y2 +X1 * Y1 = C

    Y2 Y1 Y0

    4. CP X3 X2 X1 X0 = X3 * Y0 +X0 * Y3+X2 * Y1 +X1 * Y2 = D

    Y3 Y2 Y1 Y0

    5. CP X3 X2 X1 = X3 * Y1+X1 * Y3+X2 * Y2 = E

    Y3 Y2 Y1

    6. CP X3 X2 = X3 * Y2+X2 * Y3 = F

    Y3 Y2

    7. CP X3 = X3 * Y3 = G

    Y3

    Where CP =cross product

  • 8/10/2019 Muk 2 by 2

    7/23

    3.2.2 EXAMPLE OF 4X4

    Example of 4X

    Figure 3.8 Line dia

    Firstly, least

    significant bit of the prod

    multiplied with the next hig

    LSB of multiplier and ne

    gives second bit of the pro

    sum obtained by the cross

    bits of the two numbers fro

    processed with crosswise

    The sum is the correspondin

    next stage multiplication a

    operation continues until th

    24

    EDIC MULTIPLICATION

    Vedic multiplication is shown below in

    gram for multiplication of two 4 - bit nu

    ignificant bits are multiplied which g

    uct (vertical). Then, the LSB of the

    her bit of the multiplier and added with

    t higher bit of the multiplicand (cross

    uct and the carry is added in the output

    ise and vertical multiplication and ad

    m least significant position. Next, all th

    ultiplication and addition to give the

    g bit of the product and the carry is agai

    d addition of three bits except the L

    e multiplication of the two MSBs to gi

    figure 3.8.

    bers

    ives the least

    ultiplicand is

    the product of

    ise). The sum

    of next stage

    ition of three

    e four bits are

    um and carry.

    n added to the

    B. The same

    e the MSB of

  • 8/10/2019 Muk 2 by 2

    8/23

    the product. For example, i

    as result bit (referred as rn)

    noted that cn may be a multi

    Thus we get the following er0=a0b0 ;

    c1r1=a1b0+a0b1 ;

    c2r2=c1+a2b0+a1b1 + a0b2 ;

    c3r3=c2+a3b0+a2b1 + a1b2

    c4r4=c3+a3b1+a2b2 + a1b3 ;

    c5r5=c4+a3b2+a2b3 ;

    c6r6=c5+a3b3

    With c6r6r5r4r3r2r1r0 be

    mathematical formula appli

    3.2.3 HARDWARE ARC

    This hardware

    multiplier where an array o

    Figu

    Hardware archite

    25

    in some intermediate step, we get 110,

    nd 11 as the carry (referred as cn). It sh

    bit number.

    pressions:

    + a0b3 ;

    ng the final product. Hence this i

    able to all cases of multiplication.

    ITECTURE

    design is very similar to that of the

    adders is required to arrive at the final p

    e 3.9 Hardware Architecture

    ture of 4x4 multiplier is shown in figure

    then 0 will act

    uld be clearly

    the general

    famous array

    oduct.

    .9.

  • 8/10/2019 Muk 2 by 2

    9/23

    3.3 8 X 8 MULTIPLIER

    The 8 X 8

    Here, the multiplicands are

    The input is broken into s

    and b, just like as in case o

    4 bits are given as input to

    are broken into even small

    block. Block diagram of 8X

    Figure 3.1

    The result pr

    is of 8 bits, are sent for ad

    below. Here, one fact must

    as illustrated in figure 3.6.

    26

    multiplier is made by using 4, 4 X 4 mu

    of bit size(n=8) where as the result is

    aller chunks size of n/2 = 4, for both i

    4 X 4 multiplier block.These newly for

    4 X 4 multiplier block, where again the

    er chunks of size n/4 = 2 and fed to 2

    8 Vedic multiplier is shown in figure 3.1

    Block diagram of 8X8 Vedic multiplier

    oduced , from output of 4 X 4 bit multip

    ition to an addition tree, as shown in t

    be kept in mind that, each 4 X 4 multip

    n 8 X 8 Multiply block, lower 4 bits o

    tiplier blocks.

    of 16 bit size.

    puts, that is a

    ed chunks of

    e new chunks

    X 2 multiply

    0.

    y block which

    he figure 3.11

    y block works

    q0 are passed

  • 8/10/2019 Muk 2 by 2

    10/23

    27

    directly to output and the remaining bits are fed for addition tree, as shown in

    figure 3.11.

    Figure 3.11 Addition of Partial Products in 8 X 8 block

    3.3.1 ALGORITHM FOR 8X8 BIT MULTIPLICATION

    Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most

    significant bits(MSB) .LSB bits are A0,A1,A2,A3,B0,B1,B2,B3 and MSB are

    A7,A6,A5,A4, B7,B6,B5,B4.

    A= A7A6A5A4 A3A2A1A0

    X1 X0

    B= B7B6B5B4 B3B2B1B0

    Y1 Y0

    X1 X0

    * Y1 Y0

    ---------------------------------------------------------

    FEDC

    STEP 1:CP = X0 * Y0 = C

    STEP 2:CP = X1 * Y0 + X0 * Y1 = D

  • 8/10/2019 Muk 2 by 2

    11/23

    28

    STEP 3:CP = X1 * Y1 = E

    Where CP = Cross Product.

    Each Multiplication operation is an embedded parallel 4x4 Multiply module.

    3.3.2 EXAMPLE OF 8X8 VEDIC MULTIPLICATION

    An example of 8X8 vedic multiplication of binary numbers is shown

    in figure 3.12 below.

    Figure 3.12 Example of 8X8 Vedic multiplication

    Lets say 8x8 bit multiplication of 11111111 and 00001001. While

    doing multiplication for higher no of bits, divide the number of bit equally and do

    the same analysis that used for 4x4 multiplications. It means, 11111111 should be

    treated as 1111 and 1111. Similarly 00001001 should be treated as 0000 and 1001.

    So the four different multiplications will be Now adder will add 00000000 and

    10000111 giving sum as 10000111 with no carry out, and the adder will add the

    result of the adders with 00001000 and will result sum as 10001111. Since no carry

  • 8/10/2019 Muk 2 by 2

    12/23

    is generated from either of

    zero, so nothing is to be add

    S0=1,S1=1,S2=1,S3=0,S4=

    S12=0,S13=0,S14=0,S15=0

    3.4 16 X 16 BIT MULTIP

    The 16 X 16

    Here, the multiplicands are

    The input is broken into s

    and b. These newly formed

    block, where again these n

    n/4 = 4 and fed to 4 X 4 m

    Again, the new chunks are

    2 X 2 multiplier block. Th

    block which is of 16 bits, a

    figure 3.13.

    Figure 3.13

    29

    he adder, so adder will give both sum a

    ed with 0000, so final result will be:

    1,S5=1,S6=1,S7=1,S8=0,S9=0,S10=0,S1

    .The final answer happens to be 000010

    IER

    ultiplier is made by using 4, 8 X 8 mu

    of bit size(n = 16) where as the result is

    aller chunks size of n/2 = 8, for both i

    chunks of 8 bits are given as input to 8

    w chunks are broken into even smaller

    ltiplier block, just as in case of 8 X 8

    ivided in half, to get chunks of size 2,

    e result produced, from output of 8 X

    e sent for addition to an addition tree, a

    lock diagram of 16 X 16 Multiply block

    d carry out as

    1=1,

    011110111.

    ltiplier blocks.

    of 32 bit size.

    puts, that is a

    X 8 multiplier

    chunks of size

    ultiply block.

    hich are fed to

    8 bit multiply

    s shown in the

  • 8/10/2019 Muk 2 by 2

    13/23

    30

    Here, as shown in figure 3.14 , the lower 8 bits of q0 directly pass on

    to result, while the higher bits are fed for addition into the addition tree.

    The adition of partial products is shown in figure 3.14 below.

    Figure 3.14 Addition of Partial products in 16 X 16 block

    3.4.1 ALGORITHM OF 16X16 VEDIC MULTIPLICATION

    Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the

    most significant bits(MSB). LSB bits are

    A0,A1,A2,A3,A5,A6,A7,B0,B1,B2,B3,B4,B5,B6,B7 and MSB are

    A8,A9,A10,A11,A12,A13,A14,A15 and B8,B9,B10,B11,B12,B13,B14,B15.

    A= A15A14A13A12A11A10A9A8 A7A6A5A4A3A2A1A0

    X1 X0

    B= B15B14B13B12B11B10B9B8 B7B6B5B4B3B2B1B0

    Y1 Y0

    X1X0

    * Y1Y0 FEDC

    STEP 1:CP1=X0*Y0=C1C0

    STEP 2:C=C0

    STEP 3:CP2=X1*Y0+Y1*X0=D1D0

  • 8/10/2019 Muk 2 by 2

    14/23

    31

    STEP 4:D=D0+C1

    STEP 5:CP3=X1*Y1=E1E0

    STEP 6:E=E0+D1

    STEP 7:F=E1Where CP=cross product

    3.5 32 X 32 VEDIC MULTIPLIER

    The 32 X 32 Multiplier is made by using 4, 16 X 16 multiplier blocks

    as shown in figure 3.15 Here, the multiplicands are of bit size(n=32) where as the

    result is of 64 bit size. The input is broken into smaller chunks size of n/2 = 16, for

    both inputs, that is a and b.

    Figure 3.15 Block diagram of 32X32 Vedic multiplier

  • 8/10/2019 Muk 2 by 2

    15/23

    32

    These newly formed chunks of 16 bits are given as input to 16 X 16

    multiplier block, where again these new chunks are broken into even smaller

    chunks of size n/4 = 8 and fed to 8 X 8 multiply block, just as in case of 16 X 16

    block. Again new chunks are divided in half, to get chunks of size 4, which is thenfed to 4 X 4 multiply block. The result produced, is again fed to 2 x 2 multiplier,

    then the resultant bits are sent for addition to an addition tree.

    3.5.1 ALGORITHM OF 32X32 VEDIC MULTIPLICATION

    Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most

    significant bits(MSB) . Both LSB, MSB consists of 16 bits.

    A=A31-A16 A15-A0

    X1 X0

    B= B31-B16 B15-B0

    Y1 Y0

    X1X0

    Y1Y0

    FEDC

    STEP 1:CP1=X0*Y0=C1C0

    STEP 2:C=C0

    STEP 3:CP2=X1*Y0+Y1*X0=D1D0

    STEP 4:D=D0+C1STEP 5:CP3=X1*Y1=E1E0

    STEP 6:E=E0+D1

    STEP 7:F=E1

    Where CP=cross product

  • 8/10/2019 Muk 2 by 2

    16/23

    33

    3.6 64 X 64 VEDIC MULTIPLIER

    The 64 X 64 multiplier is made by using 4, 32 X 32 multiplier

    blocks. Here, the multiplicands are of bit size(n=64) where as the result is of 128 bit size. The input is broken into smaller chunks size of n/2 = 32, for both inputs,

    that is a and b. These newly formed chunks of 32 bits are given as input to 32 X 32

    multiplier block, where again these new chunks are broken into even smaller

    chunks of size n/4 = 16 and fed to 16 X 16 multiply block, just as in case of 32 X

    32 block. Again new chunks are divided in half, to get chunks of size 8, which is

    then fed to 8 X 8 multiply block. The result produced, is again fed to 4 X 4

    multiplier, then the resultant bits are fed to 2 X 2 and final resultant bits are sent

    for addition to an addition tree, as shown in figure 3.16.

    Figure 3.16 64 X 64 VEDIC MULTIPLIER

  • 8/10/2019 Muk 2 by 2

    17/23

    34

    3.6.1.ALGORITHM OF 64X64 VEDIC MULTIPLICATION:

    Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most

    significant bits(MSB) . Both LSB, MSB consists of 32 bits.

    A=A63-A32 A31-A0

    X1 X0

    B= B63-B32 B31-B0

    Y1 Y0

    X1X0

    * Y1Y0

    FEDC

    STEP 1:CP1=X0*Y0=C1C0

    STEP 2:C=C0

    STEP 3:CP2=X1*Y0+Y1*X0=D1D0

    STEP 4:D=D0+C1

    STEP 5:CP3=X1*Y1=E1E0

    STEP 6:E=E0+D1

    STEP 7:F=E1

    Where CP=cross product

    3.7 RIPPLE CARRY ADDER

    The arrangement of Ripple Carry Adder as shown in figure 3.17 helps

    to reduce delay. A simple ripple carry adder is a digital circuit that produces thearithmetic sum of two binary numbers.It can be constructed by a number of full

    adders connected in cascade, with a carry output of each adder connected to carry

    input of next full adder in chain. Each full adder inputs a cin , which is the cout of

  • 8/10/2019 Muk 2 by 2

    18/23

    35

    the previous adder. This kind of adder is called a ripple carry adder, since each

    carry bit ripples to the next full adder.

    First full adder may be replaced by a half adder. The layout of a ripple

    carry adder is simple , which allows for fast design time.

    Figure 3.17 Circuit Diagram of 4 bit Ripple Carry Adder

    3.8.MULTIPLY ACCUMULATE UNIT:

    Multipliy-accumulate operation is one of the basic arithmetic

    operations extensively used in modern digital signal processing(DSP). Most

    arithmetic, such as digital filtering, convolution and fast Fourier Transform(FFT),

    requires high-performance multiply accumulate operations. The multiply-

    accumulator(MAC) unit always lies in the critical path that determines the speed

    of the overall hardware systems. Therefore, a high-speed MAC that is capable of

    supporting multiple precisions and parallel operations is highly desirable.

  • 8/10/2019 Muk 2 by 2

    19/23

    3.8.1 BASIC MAC ARCH

    Basically a M

    and the multiplied output o

    initially. The result of additshould be able to produce o

    is added to the previous o

    below shows basic MAC ar

    Here the multi

    Urdhva Tiryakbyham Sutra

    Figu

    3.8.2 MAC UNIT USING

    In the MAC u

    registers, that is data a_reg

    multiplier, which stores the

    are continuously fed into

    dataout_reg. Here, the MA

    36

    TECTURE

    C unit employs a fast multiplier fitted i

    multiplier is fed into a fast adder whic

    on is stored in an accumulator register.utput in one clock cycle and the new re

    e and stored in the accumulator regist

    hitecture.

    lier that has been used is a Vedic M

    and has been fitted into the MAC design

    re 3.18 Basic MAC architecture

    EDIC MULTIPLIER

    nit, the data inputs A and B are store

    and data b_reg. Then the inputs are fe

    result in Multiply_reg. The contents o

    a conventional adder and the result

    unit make use of two clocks, one for t

    the data path

    is set to zero

    he MAC unitult of addition

    r. Figure 3.18

    ultiplier using

    d in two data

    into a Vedic

    Multiply_reg

    s stored in a

    e operation of

  • 8/10/2019 Muk 2 by 2

    20/23

    37

    MAC unit and the other one, namely clk2 for the multiplier. The frequency of clk2

    should be 4 times the frequency of MAC unit for proper operation. A clock divider

    by 4 circuit may be used, in future here, which takes clk2 as the parent clock and

    produces clk as the daughter clock, which is 4 times slower than the parent clock, but with 50% duty cycle. The faster clock clk2 is used for the multiplier while

    slower clock clk is used for the MAC unit. The data coming as input to MAC

    may vary with clock clk.

    The signal clr when applied , makes the contents of all the data

    registers that is Data a_reg,Data b-reg,multiply_reg and dataout_reg to be forced to

    be zero. The clken signal is used to enable the MAC operation. Figure 3.19

    shows the architecture of MAC.

    Figure 3.19 MAC using Vedic Multiplier

    Multiplication Accumulation is an important part of real-time digital signal

    processing (DSP) with applications ranging from digital filtering to image processing.

    Multiply and accumulate is a very common basic-level operation seen in many DSP

  • 8/10/2019 Muk 2 by 2

    21/23

    38

    designs/algorithms. Two numbers are multiplied together, and added into an

    accumulator register. As shown in figure 3.20 and 3.21, the basic MAC unit consists of

    multiplier, adder and accumulator.

    Figure 3.20 Architecture of Vedic Multiplier

    Figure 3.21 Architecture of Booth Multiplier

  • 8/10/2019 Muk 2 by 2

    22/23

    39

    In general MAC unit uses the conventional multiplier unit, which consists

    of multiplication of multiplier and multiplicand based on adding the generated partial

    products and to compute the final multiplication. This results to adding the partial

    products. The key to the proposed MAC unit is to enhance the performance of MACusing Vedic Multiplier and to compare the Vedic, Booth and conventional multiplier in

    terms of computation required to generate the partial products and add the generated

    partial products to get the final result of the multiplication.

    3.9 ADVANTAGES

    i)Vedic Multiplier is faster than array multiplier and Booth multiplier. As the

    number of bits increases from 4X4 bits to 32x32 bits, the timing delay is greatly

    reduced for Vedic multiplier as compared to other multipliers. Vedic Multiplier has

    the greatest advantage as compared to other multipliers over gate delays and

    regularity of structures.

    ii) Power dissipation is very less when compared to booth multipliers.

    3.10 APPLICATIONS

    i)MAC.

    ii)DSP applications(FIR,IIR filters).

    3.11 TOOLS USED

    3.11.1 SOFTWARE USED

    i)Modelsim 6.3 for simulation:

    Modelsim is a popular hardware simulation and debug environment

    primarily targeted at smaller ASIC and FPGA design. ModelSim provides a

  • 8/10/2019 Muk 2 by 2

    23/23

    40

    complete HDL simulation environment that enables you to verify the functional

    and timing models of your design, and your HDL source code. It is optimized for

    use with all configurations of Xilinx ISE products.

    ii)Xilinx 10.1 for synthesis:

    Xilinx ISE is a software tool produced by Xilinx for synthesis and

    analysis of HDL designs, which enables the developer to synthesize ("compile")

    their designs, perform timing analysis, examine RTL diagrams, simulate a design's

    reaction to different stimuli, and configure the target device with the programmer.

    3.11.2 HARDWARE USED

    FIELD PROGRAMMABLE GATE ARRAY (FPGA)

    FPGAs are programmable semiconductor devices that are based

    around a matrix of Configurable Logic Blocks (CLBs) connected through

    programmable interconnects. As opposed to Application Specific Integrated

    Circuits (ASICs), where the device is custom built for the particular design,FPGAs can be programmed to the desired application or functionality

    requirements. Although a One-Time Programmable (OTP) FPGAs are available. In

    our project we are using Spartan 3 FPGA kit.

    SPARTAN 3

    The Spartan 3 trainer xc3s400 pq208 is useful to realize and verify

    digital designs. User can construct Verilog/VHDL code and verify the results byimplementing physically into the target device (FPGA). With the help of this kit

    user can simulate/observe various input and output conditions to verify the

    implemented design.