Muk 2 by 2

8/10/2019 Muk 2 by 2

1/23

PROPOSED SYSTEM

3.1 2X2 VEDIC MULTIP

The method ex

A=a1a0 and B=b1b0 as sho

are multiplied which gives

multiplicand is multiplied

the product of LSB of the

sum gives second bit of t

product obtained by multipl

The sum is the third corresp

product.

s0= a0b0; (1)

c1s1= a1b0+a0b1;

c2s2= c1+a1b1; (3

The final result will be c2

cases.

Figure 3.1 The Vedic Mu

18

CHAPTER 3

LIER

lained below for two, 2 bit numbers

n in figure 3.1. Firstly, the least signifi

the LSB of the final product. Then, t

ith the next higher bit of the multiplier a

ultiplier and next higher bit of the mu

e final product and carry is added

ing the most significant bits to give the

onding bit and carry becomes the fourth

(2)

)

2s1s0.This multiplication method is ap

ltiplication Method for two 2-bit binary

2X2 bit

and B where

ant bits (LSB)

e LSB of the

d added with,

tiplicand. The

ith the partial

sum and carry.

bit of the final

licable for all

numbers for

8/10/2019 Muk 2 by 2

2/23

The 2X2 Vedi

input AND gates and two

Figure 3.2. The same met

multiplier is based on Urd

Figure

3.1.1 HARDWARE REA

The hardware

Figure 3.3. For the sake o

but emphasis has been laid

Figure

19

c multiplier(VM) module is implemen

alf-adders which is displayed in its bl

od can be extended number of input b

va-tiryakbyham Sutra.

3.2 Block Diagram of 2X2 Vedic Multi

IZATION OF 2X2 MULTIPLIER B

realization of 2 X 2 multiplier block i

simplicity, usage of clock and register

n understanding of the algorithm.

3.3 Hardware realization of 2 X 2 block

ed using four

ck diagram in

ts. The Vedic

lier

OCK

illustrated in

is not shown,

8/10/2019 Muk 2 by 2

3/23

20

3.1.2 EXAMPLE OF 2X2 VEDIC MULTIPLICATION

Example of decimal and binary Vedic multiplication of 2X2 bit is

shown below in Figure 3.4 and 3.5.

Figure 3.4 Vedic multiplication of decimal numbers

Figure 3.5 Vedic multiplication of binary numbers

8/10/2019 Muk 2 by 2

4/23

21

3.1.3 ALGORITHM

Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most

significant bits(MSB) .

X1X0

* Y1Y0

FEDC

STEP 1:CP1=X0*Y0=C1C0

STEP 2;C=C0

STEP 3:CP2=X1*Y0+Y1*X0=D1D0

STEP 4:D=D0+C1

STEP 5:CP3=X1*Y1=E1E0

STEP 6:E=E0+D1

STEP 7:F=E1

Where CP=cross product

X=multiplicand Y=multiplier

3.2 VEDIC MULTIPLIER FOR 4X4 BIT

Block diagram of 4X4 bit Vedic multiplier is shown in figure 3.6. To

get the final product, 4 two bit Vedic multipliers are used and three 4 bit ripple

carry adders are required. In this proposal, the first 4-bit RC adder is used to add

two 4-bit operands obtained from cross multiplication of the two middle 2X2 bit

multiplier modules. The second 4-bit RC adder is used to add two 4-bit operands.

i.e. concatenated 4-bit(00 & most significant two output bits of right hand most

of 2X2 multiplier module) and one 4-bit operand we get as the output sum of hand

8/10/2019 Muk 2 by 2

5/23

most of 2X2 multiplier m

speaks about Vedic multipli

bit multiplier is constructed

constructed using four 8 bit

Figure

Here, insteadmodified to Wallace tree l

instead of 3. Here, two low

of q0 are fed into addition

illustrated by the diagram in

Figure 3.7

22

dule. Early literature speaks about Ve

ers based on array multiplier structures.

using four 4 bit multipliers and 16X16

multipliers and so on.

3.6 Block diagram of 4X4 Vedic Multi

f following serial addition, the additioook alike, thus reducing the levels of

r bits of q0 pass directly to output, whil

tree. The bits being fed to addition tree

figure 3.7.

ddition of partial products in 4 x 4 block

ic multipliers

imilarly, 8X8

it multiplier is

lier

tree has beenaddition to 2,

the upper bits

can be further

8/10/2019 Muk 2 by 2

6/23

23

3.2.1 Algorithm for 4 x 4 bit Vedic multiplier Using Urdhva Tiryakbhyam

(Vertically and crosswise) for two Binary numbers

CP = Cross Product (Vertically and Crosswise)

X3 X2 X1 X0 Multiplicand

Y3 Y2 Y1 Y0 Multiplier

------------------------------------------------------------------

H G F E D C B A

----------------------------------------------------------------

P7 P6 P5 P4 P3 P2 P1 P0 Product

--------------------------------------------------------------------

PARALLEL COMPUTATION METHODOLOGY

1. CP X0 = X0 * Y0 = A

Y0

2. CP X1 X0 = X1 * Y0+X0 * Y1 = B

Y1 Y0

3. CP X2 X1 X0 = X2 * Y0 +X0 * Y2 +X1 * Y1 = C

Y2 Y1 Y0

4. CP X3 X2 X1 X0 = X3 * Y0 +X0 * Y3+X2 * Y1 +X1 * Y2 = D

Y3 Y2 Y1 Y0

5. CP X3 X2 X1 = X3 * Y1+X1 * Y3+X2 * Y2 = E

Y3 Y2 Y1

6. CP X3 X2 = X3 * Y2+X2 * Y3 = F

Y3 Y2

7. CP X3 = X3 * Y3 = G

Y3

Where CP =cross product

8/10/2019 Muk 2 by 2

7/23

3.2.2 EXAMPLE OF 4X4

Example of 4X

Figure 3.8 Line dia

Firstly, least

significant bit of the prod

multiplied with the next hig

LSB of multiplier and ne

gives second bit of the pro

sum obtained by the cross

bits of the two numbers fro

processed with crosswise

The sum is the correspondin

next stage multiplication a

operation continues until th

24

EDIC MULTIPLICATION

Vedic multiplication is shown below in

gram for multiplication of two 4 - bit nu

ignificant bits are multiplied which g

uct (vertical). Then, the LSB of the

her bit of the multiplier and added with

t higher bit of the multiplicand (cross

uct and the carry is added in the output

ise and vertical multiplication and ad

m least significant position. Next, all th

ultiplication and addition to give the

g bit of the product and the carry is agai

d addition of three bits except the L

e multiplication of the two MSBs to gi

figure 3.8.

bers

ives the least

ultiplicand is

the product of

ise). The sum

of next stage

ition of three

e four bits are

um and carry.

n added to the

B. The same

e the MSB of

8/10/2019 Muk 2 by 2

8/23

the product. For example, i

as result bit (referred as rn)

noted that cn may be a multi

Thus we get the following er0=a0b0 ;

c1r1=a1b0+a0b1 ;

c2r2=c1+a2b0+a1b1 + a0b2 ;

c3r3=c2+a3b0+a2b1 + a1b2

c4r4=c3+a3b1+a2b2 + a1b3 ;

c5r5=c4+a3b2+a2b3 ;

c6r6=c5+a3b3

With c6r6r5r4r3r2r1r0 be

mathematical formula appli

3.2.3 HARDWARE ARC

This hardware

multiplier where an array o

Figu

Hardware archite

25

in some intermediate step, we get 110,

nd 11 as the carry (referred as cn). It sh

bit number.

pressions:

+ a0b3 ;

ng the final product. Hence this i

able to all cases of multiplication.

ITECTURE

design is very similar to that of the

adders is required to arrive at the final p

e 3.9 Hardware Architecture

ture of 4x4 multiplier is shown in figure

then 0 will act

uld be clearly

the general

famous array

oduct.

.9.

8/10/2019 Muk 2 by 2

9/23

3.3 8 X 8 MULTIPLIER

The 8 X 8

Here, the multiplicands are

The input is broken into s

and b, just like as in case o

4 bits are given as input to

are broken into even small

block. Block diagram of 8X

Figure 3.1

The result pr

is of 8 bits, are sent for ad

below. Here, one fact must

as illustrated in figure 3.6.

26

multiplier is made by using 4, 4 X 4 mu

of bit size(n=8) where as the result is

aller chunks size of n/2 = 4, for both i

4 X 4 multiplier block.These newly for

4 X 4 multiplier block, where again the

er chunks of size n/4 = 2 and fed to 2

8 Vedic multiplier is shown in figure 3.1

Block diagram of 8X8 Vedic multiplier

oduced , from output of 4 X 4 bit multip

ition to an addition tree, as shown in t

be kept in mind that, each 4 X 4 multip

n 8 X 8 Multiply block, lower 4 bits o

tiplier blocks.

of 16 bit size.

puts, that is a

ed chunks of

e new chunks

X 2 multiply

0.

y block which

he figure 3.11

y block works

q0 are passed

8/10/2019 Muk 2 by 2

10/23

27

directly to output and the remaining bits are fed for addition tree, as shown in

figure 3.11.

Figure 3.11 Addition of Partial Products in 8 X 8 block

3.3.1 ALGORITHM FOR 8X8 BIT MULTIPLICATION


significant bits(MSB) .LSB bits are A0,A1,A2,A3,B0,B1,B2,B3 and MSB are

A7,A6,A5,A4, B7,B6,B5,B4.

A= A7A6A5A4 A3A2A1A0

X1 X0

B= B7B6B5B4 B3B2B1B0

Y1 Y0

X1 X0

* Y1 Y0

---------------------------------------------------------

FEDC

STEP 1:CP = X0 * Y0 = C

STEP 2:CP = X1 * Y0 + X0 * Y1 = D

8/10/2019 Muk 2 by 2

11/23

28

STEP 3:CP = X1 * Y1 = E

Where CP = Cross Product.

Each Multiplication operation is an embedded parallel 4x4 Multiply module.

3.3.2 EXAMPLE OF 8X8 VEDIC MULTIPLICATION

An example of 8X8 vedic multiplication of binary numbers is shown

in figure 3.12 below.

Figure 3.12 Example of 8X8 Vedic multiplication

Lets say 8x8 bit multiplication of 11111111 and 00001001. While

doing multiplication for higher no of bits, divide the number of bit equally and do

the same analysis that used for 4x4 multiplications. It means, 11111111 should be

treated as 1111 and 1111. Similarly 00001001 should be treated as 0000 and 1001.

So the four different multiplications will be Now adder will add 00000000 and

10000111 giving sum as 10000111 with no carry out, and the adder will add the

result of the adders with 00001000 and will result sum as 10001111. Since no carry

8/10/2019 Muk 2 by 2

12/23

is generated from either of

zero, so nothing is to be add

S0=1,S1=1,S2=1,S3=0,S4=

S12=0,S13=0,S14=0,S15=0

3.4 16 X 16 BIT MULTIP

The 16 X 16

Here, the multiplicands are

The input is broken into s

and b. These newly formed

block, where again these n

n/4 = 4 and fed to 4 X 4 m

Again, the new chunks are

2 X 2 multiplier block. Th

block which is of 16 bits, a

figure 3.13.

Figure 3.13

29

he adder, so adder will give both sum a

ed with 0000, so final result will be:

1,S5=1,S6=1,S7=1,S8=0,S9=0,S10=0,S1

.The final answer happens to be 000010

IER

ultiplier is made by using 4, 8 X 8 mu

of bit size(n = 16) where as the result is

aller chunks size of n/2 = 8, for both i

chunks of 8 bits are given as input to 8

w chunks are broken into even smaller

ltiplier block, just as in case of 8 X 8

ivided in half, to get chunks of size 2,

e result produced, from output of 8 X

e sent for addition to an addition tree, a

lock diagram of 16 X 16 Multiply block

d carry out as

1=1,

011110111.

ltiplier blocks.

of 32 bit size.

puts, that is a

X 8 multiplier

chunks of size

ultiply block.

hich are fed to

8 bit multiply

s shown in the

8/10/2019 Muk 2 by 2

13/23

30

Here, as shown in figure 3.14 , the lower 8 bits of q0 directly pass on

to result, while the higher bits are fed for addition into the addition tree.

The adition of partial products is shown in figure 3.14 below.

Figure 3.14 Addition of Partial products in 16 X 16 block

3.4.1 ALGORITHM OF 16X16 VEDIC MULTIPLICATION

Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the

most significant bits(MSB). LSB bits are

A0,A1,A2,A3,A5,A6,A7,B0,B1,B2,B3,B4,B5,B6,B7 and MSB are

A8,A9,A10,A11,A12,A13,A14,A15 and B8,B9,B10,B11,B12,B13,B14,B15.

A= A15A14A13A12A11A10A9A8 A7A6A5A4A3A2A1A0

X1 X0

B= B15B14B13B12B11B10B9B8 B7B6B5B4B3B2B1B0

Y1 Y0

X1X0

* Y1Y0 FEDC


STEP 2:C=C0


8/10/2019 Muk 2 by 2

14/23

31

STEP 4:D=D0+C1


STEP 6:E=E0+D1

STEP 7:F=E1Where CP=cross product

3.5 32 X 32 VEDIC MULTIPLIER

The 32 X 32 Multiplier is made by using 4, 16 X 16 multiplier blocks

as shown in figure 3.15 Here, the multiplicands are of bit size(n=32) where as the

result is of 64 bit size. The input is broken into smaller chunks size of n/2 = 16, for

both inputs, that is a and b.

Figure 3.15 Block diagram of 32X32 Vedic multiplier

8/10/2019 Muk 2 by 2

15/23

32

These newly formed chunks of 16 bits are given as input to 16 X 16

multiplier block, where again these new chunks are broken into even smaller

chunks of size n/4 = 8 and fed to 8 X 8 multiply block, just as in case of 16 X 16

block. Again new chunks are divided in half, to get chunks of size 4, which is thenfed to 4 X 4 multiply block. The result produced, is again fed to 2 x 2 multiplier,

then the resultant bits are sent for addition to an addition tree.

3.5.1 ALGORITHM OF 32X32 VEDIC MULTIPLICATION


significant bits(MSB) . Both LSB, MSB consists of 16 bits.

A=A31-A16 A15-A0

X1 X0

B= B31-B16 B15-B0

Y1 Y0

X1X0

Y1Y0

FEDC


STEP 2:C=C0


STEP 4:D=D0+C1STEP 5:CP3=X1*Y1=E1E0

STEP 6:E=E0+D1

STEP 7:F=E1


8/10/2019 Muk 2 by 2

16/23

33

3.6 64 X 64 VEDIC MULTIPLIER

The 64 X 64 multiplier is made by using 4, 32 X 32 multiplier

blocks. Here, the multiplicands are of bit size(n=64) where as the result is of 128 bit size. The input is broken into smaller chunks size of n/2 = 32, for both inputs,

that is a and b. These newly formed chunks of 32 bits are given as input to 32 X 32

multiplier block, where again these new chunks are broken into even smaller

chunks of size n/4 = 16 and fed to 16 X 16 multiply block, just as in case of 32 X

32 block. Again new chunks are divided in half, to get chunks of size 8, which is

then fed to 8 X 8 multiply block. The result produced, is again fed to 4 X 4

multiplier, then the resultant bits are fed to 2 X 2 and final resultant bits are sent

for addition to an addition tree, as shown in figure 3.16.

Figure 3.16 64 X 64 VEDIC MULTIPLIER

8/10/2019 Muk 2 by 2

17/23

34

3.6.1.ALGORITHM OF 64X64 VEDIC MULTIPLICATION:


significant bits(MSB) . Both LSB, MSB consists of 32 bits.

A=A63-A32 A31-A0

X1 X0

B= B63-B32 B31-B0

Y1 Y0

X1X0

* Y1Y0

FEDC


STEP 2:C=C0


STEP 4:D=D0+C1


STEP 6:E=E0+D1

STEP 7:F=E1


3.7 RIPPLE CARRY ADDER

The arrangement of Ripple Carry Adder as shown in figure 3.17 helps

to reduce delay. A simple ripple carry adder is a digital circuit that produces thearithmetic sum of two binary numbers.It can be constructed by a number of full

adders connected in cascade, with a carry output of each adder connected to carry

input of next full adder in chain. Each full adder inputs a cin , which is the cout of

8/10/2019 Muk 2 by 2

18/23

35

the previous adder. This kind of adder is called a ripple carry adder, since each

carry bit ripples to the next full adder.

First full adder may be replaced by a half adder. The layout of a ripple

carry adder is simple , which allows for fast design time.

Figure 3.17 Circuit Diagram of 4 bit Ripple Carry Adder

3.8.MULTIPLY ACCUMULATE UNIT:

Multipliy-accumulate operation is one of the basic arithmetic

operations extensively used in modern digital signal processing(DSP). Most

arithmetic, such as digital filtering, convolution and fast Fourier Transform(FFT),

requires high-performance multiply accumulate operations. The multiply-

accumulator(MAC) unit always lies in the critical path that determines the speed

of the overall hardware systems. Therefore, a high-speed MAC that is capable of

supporting multiple precisions and parallel operations is highly desirable.

8/10/2019 Muk 2 by 2

19/23

3.8.1 BASIC MAC ARCH

Basically a M

and the multiplied output o

initially. The result of additshould be able to produce o

is added to the previous o

below shows basic MAC ar

Here the multi

Urdhva Tiryakbyham Sutra

Figu

3.8.2 MAC UNIT USING

In the MAC u

registers, that is data a_reg

multiplier, which stores the

are continuously fed into

dataout_reg. Here, the MA

36

TECTURE

C unit employs a fast multiplier fitted i

multiplier is fed into a fast adder whic

on is stored in an accumulator register.utput in one clock cycle and the new re

e and stored in the accumulator regist

hitecture.

lier that has been used is a Vedic M

and has been fitted into the MAC design

re 3.18 Basic MAC architecture

EDIC MULTIPLIER

nit, the data inputs A and B are store

and data b_reg. Then the inputs are fe

result in Multiply_reg. The contents o

a conventional adder and the result

unit make use of two clocks, one for t

the data path

is set to zero

he MAC unitult of addition

r. Figure 3.18

ultiplier using

d in two data

into a Vedic

Multiply_reg

s stored in a

e operation of

8/10/2019 Muk 2 by 2

20/23

37

MAC unit and the other one, namely clk2 for the multiplier. The frequency of clk2

should be 4 times the frequency of MAC unit for proper operation. A clock divider

by 4 circuit may be used, in future here, which takes clk2 as the parent clock and

produces clk as the daughter clock, which is 4 times slower than the parent clock, but with 50% duty cycle. The faster clock clk2 is used for the multiplier while

slower clock clk is used for the MAC unit. The data coming as input to MAC

may vary with clock clk.

The signal clr when applied , makes the contents of all the data

registers that is Data a_reg,Data b-reg,multiply_reg and dataout_reg to be forced to

be zero. The clken signal is used to enable the MAC operation. Figure 3.19

shows the architecture of MAC.

Figure 3.19 MAC using Vedic Multiplier

Multiplication Accumulation is an important part of real-time digital signal

processing (DSP) with applications ranging from digital filtering to image processing.

Multiply and accumulate is a very common basic-level operation seen in many DSP

8/10/2019 Muk 2 by 2

21/23

38

designs/algorithms. Two numbers are multiplied together, and added into an

accumulator register. As shown in figure 3.20 and 3.21, the basic MAC unit consists of

multiplier, adder and accumulator.

Figure 3.20 Architecture of Vedic Multiplier

Figure 3.21 Architecture of Booth Multiplier

8/10/2019 Muk 2 by 2

22/23

39

In general MAC unit uses the conventional multiplier unit, which consists

of multiplication of multiplier and multiplicand based on adding the generated partial

products and to compute the final multiplication. This results to adding the partial

products. The key to the proposed MAC unit is to enhance the performance of MACusing Vedic Multiplier and to compare the Vedic, Booth and conventional multiplier in

terms of computation required to generate the partial products and add the generated

partial products to get the final result of the multiplication.

3.9 ADVANTAGES

i)Vedic Multiplier is faster than array multiplier and Booth multiplier. As the

number of bits increases from 4X4 bits to 32x32 bits, the timing delay is greatly

reduced for Vedic multiplier as compared to other multipliers. Vedic Multiplier has

the greatest advantage as compared to other multipliers over gate delays and

regularity of structures.

ii) Power dissipation is very less when compared to booth multipliers.

3.10 APPLICATIONS

i)MAC.

ii)DSP applications(FIR,IIR filters).

3.11 TOOLS USED

3.11.1 SOFTWARE USED

i)Modelsim 6.3 for simulation:

Modelsim is a popular hardware simulation and debug environment

primarily targeted at smaller ASIC and FPGA design. ModelSim provides a

8/10/2019 Muk 2 by 2

23/23

40

complete HDL simulation environment that enables you to verify the functional

and timing models of your design, and your HDL source code. It is optimized for

use with all configurations of Xilinx ISE products.

ii)Xilinx 10.1 for synthesis:

Xilinx ISE is a software tool produced by Xilinx for synthesis and

analysis of HDL designs, which enables the developer to synthesize ("compile")

their designs, perform timing analysis, examine RTL diagrams, simulate a design's

reaction to different stimuli, and configure the target device with the programmer.

3.11.2 HARDWARE USED

FIELD PROGRAMMABLE GATE ARRAY (FPGA)

FPGAs are programmable semiconductor devices that are based

around a matrix of Configurable Logic Blocks (CLBs) connected through

programmable interconnects. As opposed to Application Specific Integrated

Circuits (ASICs), where the device is custom built for the particular design,FPGAs can be programmed to the desired application or functionality

requirements. Although a One-Time Programmable (OTP) FPGAs are available. In

our project we are using Spartan 3 FPGA kit.

SPARTAN 3

The Spartan 3 trainer xc3s400 pq208 is useful to realize and verify

digital designs. User can construct Verilog/VHDL code and verify the results byimplementing physically into the target device (FPGA). With the help of this kit

user can simulate/observe various input and output conditions to verify the

implemented design.

Documents

Muk 2 by 2