Upload
techiealyy
View
220
Download
0
Embed Size (px)
Citation preview
8/10/2019 Muk 2 by 2
1/23
PROPOSED SYSTEM
3.1 2X2 VEDIC MULTIP
The method ex
A=a1a0 and B=b1b0 as sho
are multiplied which gives
multiplicand is multiplied
the product of LSB of the
sum gives second bit of t
product obtained by multipl
The sum is the third corresp
product.
s0= a0b0; (1)
c1s1= a1b0+a0b1;
c2s2= c1+a1b1; (3
The final result will be c2
cases.
Figure 3.1 The Vedic Mu
18
CHAPTER 3
LIER
lained below for two, 2 bit numbers
n in figure 3.1. Firstly, the least signifi
the LSB of the final product. Then, t
ith the next higher bit of the multiplier a
ultiplier and next higher bit of the mu
e final product and carry is added
ing the most significant bits to give the
onding bit and carry becomes the fourth
(2)
)
2s1s0.This multiplication method is ap
ltiplication Method for two 2-bit binary
2X2 bit
and B where
ant bits (LSB)
e LSB of the
d added with,
tiplicand. The
ith the partial
sum and carry.
bit of the final
licable for all
numbers for
8/10/2019 Muk 2 by 2
2/23
The 2X2 Vedi
input AND gates and two
Figure 3.2. The same met
multiplier is based on Urd
Figure
3.1.1 HARDWARE REA
The hardware
Figure 3.3. For the sake o
but emphasis has been laid
Figure
19
c multiplier(VM) module is implemen
alf-adders which is displayed in its bl
od can be extended number of input b
va-tiryakbyham Sutra.
3.2 Block Diagram of 2X2 Vedic Multi
IZATION OF 2X2 MULTIPLIER B
realization of 2 X 2 multiplier block i
simplicity, usage of clock and register
n understanding of the algorithm.
3.3 Hardware realization of 2 X 2 block
ed using four
ck diagram in
ts. The Vedic
lier
OCK
illustrated in
is not shown,
8/10/2019 Muk 2 by 2
3/23
20
3.1.2 EXAMPLE OF 2X2 VEDIC MULTIPLICATION
Example of decimal and binary Vedic multiplication of 2X2 bit is
shown below in Figure 3.4 and 3.5.
Figure 3.4 Vedic multiplication of decimal numbers
Figure 3.5 Vedic multiplication of binary numbers
8/10/2019 Muk 2 by 2
4/23
21
3.1.3 ALGORITHM
Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most
significant bits(MSB) .
X1X0
* Y1Y0
FEDC
STEP 1:CP1=X0*Y0=C1C0
STEP 2;C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
X=multiplicand Y=multiplier
3.2 VEDIC MULTIPLIER FOR 4X4 BIT
Block diagram of 4X4 bit Vedic multiplier is shown in figure 3.6. To
get the final product, 4 two bit Vedic multipliers are used and three 4 bit ripple
carry adders are required. In this proposal, the first 4-bit RC adder is used to add
two 4-bit operands obtained from cross multiplication of the two middle 2X2 bit
multiplier modules. The second 4-bit RC adder is used to add two 4-bit operands.
i.e. concatenated 4-bit(00 & most significant two output bits of right hand most
of 2X2 multiplier module) and one 4-bit operand we get as the output sum of hand
8/10/2019 Muk 2 by 2
5/23
most of 2X2 multiplier m
speaks about Vedic multipli
bit multiplier is constructed
constructed using four 8 bit
Figure
Here, insteadmodified to Wallace tree l
instead of 3. Here, two low
of q0 are fed into addition
illustrated by the diagram in
Figure 3.7
22
dule. Early literature speaks about Ve
ers based on array multiplier structures.
using four 4 bit multipliers and 16X16
multipliers and so on.
3.6 Block diagram of 4X4 Vedic Multi
f following serial addition, the additioook alike, thus reducing the levels of
r bits of q0 pass directly to output, whil
tree. The bits being fed to addition tree
figure 3.7.
ddition of partial products in 4 x 4 block
ic multipliers
imilarly, 8X8
it multiplier is
lier
tree has beenaddition to 2,
the upper bits
can be further
8/10/2019 Muk 2 by 2
6/23
23
3.2.1 Algorithm for 4 x 4 bit Vedic multiplier Using Urdhva Tiryakbhyam
(Vertically and crosswise) for two Binary numbers
CP = Cross Product (Vertically and Crosswise)
X3 X2 X1 X0 Multiplicand
Y3 Y2 Y1 Y0 Multiplier
------------------------------------------------------------------
H G F E D C B A
----------------------------------------------------------------
P7 P6 P5 P4 P3 P2 P1 P0 Product
--------------------------------------------------------------------
PARALLEL COMPUTATION METHODOLOGY
1. CP X0 = X0 * Y0 = A
Y0
2. CP X1 X0 = X1 * Y0+X0 * Y1 = B
Y1 Y0
3. CP X2 X1 X0 = X2 * Y0 +X0 * Y2 +X1 * Y1 = C
Y2 Y1 Y0
4. CP X3 X2 X1 X0 = X3 * Y0 +X0 * Y3+X2 * Y1 +X1 * Y2 = D
Y3 Y2 Y1 Y0
5. CP X3 X2 X1 = X3 * Y1+X1 * Y3+X2 * Y2 = E
Y3 Y2 Y1
6. CP X3 X2 = X3 * Y2+X2 * Y3 = F
Y3 Y2
7. CP X3 = X3 * Y3 = G
Y3
Where CP =cross product
8/10/2019 Muk 2 by 2
7/23
3.2.2 EXAMPLE OF 4X4
Example of 4X
Figure 3.8 Line dia
Firstly, least
significant bit of the prod
multiplied with the next hig
LSB of multiplier and ne
gives second bit of the pro
sum obtained by the cross
bits of the two numbers fro
processed with crosswise
The sum is the correspondin
next stage multiplication a
operation continues until th
24
EDIC MULTIPLICATION
Vedic multiplication is shown below in
gram for multiplication of two 4 - bit nu
ignificant bits are multiplied which g
uct (vertical). Then, the LSB of the
her bit of the multiplier and added with
t higher bit of the multiplicand (cross
uct and the carry is added in the output
ise and vertical multiplication and ad
m least significant position. Next, all th
ultiplication and addition to give the
g bit of the product and the carry is agai
d addition of three bits except the L
e multiplication of the two MSBs to gi
figure 3.8.
bers
ives the least
ultiplicand is
the product of
ise). The sum
of next stage
ition of three
e four bits are
um and carry.
n added to the
B. The same
e the MSB of
8/10/2019 Muk 2 by 2
8/23
the product. For example, i
as result bit (referred as rn)
noted that cn may be a multi
Thus we get the following er0=a0b0 ;
c1r1=a1b0+a0b1 ;
c2r2=c1+a2b0+a1b1 + a0b2 ;
c3r3=c2+a3b0+a2b1 + a1b2
c4r4=c3+a3b1+a2b2 + a1b3 ;
c5r5=c4+a3b2+a2b3 ;
c6r6=c5+a3b3
With c6r6r5r4r3r2r1r0 be
mathematical formula appli
3.2.3 HARDWARE ARC
This hardware
multiplier where an array o
Figu
Hardware archite
25
in some intermediate step, we get 110,
nd 11 as the carry (referred as cn). It sh
bit number.
pressions:
+ a0b3 ;
ng the final product. Hence this i
able to all cases of multiplication.
ITECTURE
design is very similar to that of the
adders is required to arrive at the final p
e 3.9 Hardware Architecture
ture of 4x4 multiplier is shown in figure
then 0 will act
uld be clearly
the general
famous array
oduct.
.9.
8/10/2019 Muk 2 by 2
9/23
3.3 8 X 8 MULTIPLIER
The 8 X 8
Here, the multiplicands are
The input is broken into s
and b, just like as in case o
4 bits are given as input to
are broken into even small
block. Block diagram of 8X
Figure 3.1
The result pr
is of 8 bits, are sent for ad
below. Here, one fact must
as illustrated in figure 3.6.
26
multiplier is made by using 4, 4 X 4 mu
of bit size(n=8) where as the result is
aller chunks size of n/2 = 4, for both i
4 X 4 multiplier block.These newly for
4 X 4 multiplier block, where again the
er chunks of size n/4 = 2 and fed to 2
8 Vedic multiplier is shown in figure 3.1
Block diagram of 8X8 Vedic multiplier
oduced , from output of 4 X 4 bit multip
ition to an addition tree, as shown in t
be kept in mind that, each 4 X 4 multip
n 8 X 8 Multiply block, lower 4 bits o
tiplier blocks.
of 16 bit size.
puts, that is a
ed chunks of
e new chunks
X 2 multiply
0.
y block which
he figure 3.11
y block works
q0 are passed
8/10/2019 Muk 2 by 2
10/23
27
directly to output and the remaining bits are fed for addition tree, as shown in
figure 3.11.
Figure 3.11 Addition of Partial Products in 8 X 8 block
3.3.1 ALGORITHM FOR 8X8 BIT MULTIPLICATION
Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most
significant bits(MSB) .LSB bits are A0,A1,A2,A3,B0,B1,B2,B3 and MSB are
A7,A6,A5,A4, B7,B6,B5,B4.
A= A7A6A5A4 A3A2A1A0
X1 X0
B= B7B6B5B4 B3B2B1B0
Y1 Y0
X1 X0
* Y1 Y0
---------------------------------------------------------
FEDC
STEP 1:CP = X0 * Y0 = C
STEP 2:CP = X1 * Y0 + X0 * Y1 = D
8/10/2019 Muk 2 by 2
11/23
28
STEP 3:CP = X1 * Y1 = E
Where CP = Cross Product.
Each Multiplication operation is an embedded parallel 4x4 Multiply module.
3.3.2 EXAMPLE OF 8X8 VEDIC MULTIPLICATION
An example of 8X8 vedic multiplication of binary numbers is shown
in figure 3.12 below.
Figure 3.12 Example of 8X8 Vedic multiplication
Lets say 8x8 bit multiplication of 11111111 and 00001001. While
doing multiplication for higher no of bits, divide the number of bit equally and do
the same analysis that used for 4x4 multiplications. It means, 11111111 should be
treated as 1111 and 1111. Similarly 00001001 should be treated as 0000 and 1001.
So the four different multiplications will be Now adder will add 00000000 and
10000111 giving sum as 10000111 with no carry out, and the adder will add the
result of the adders with 00001000 and will result sum as 10001111. Since no carry
8/10/2019 Muk 2 by 2
12/23
is generated from either of
zero, so nothing is to be add
S0=1,S1=1,S2=1,S3=0,S4=
S12=0,S13=0,S14=0,S15=0
3.4 16 X 16 BIT MULTIP
The 16 X 16
Here, the multiplicands are
The input is broken into s
and b. These newly formed
block, where again these n
n/4 = 4 and fed to 4 X 4 m
Again, the new chunks are
2 X 2 multiplier block. Th
block which is of 16 bits, a
figure 3.13.
Figure 3.13
29
he adder, so adder will give both sum a
ed with 0000, so final result will be:
1,S5=1,S6=1,S7=1,S8=0,S9=0,S10=0,S1
.The final answer happens to be 000010
IER
ultiplier is made by using 4, 8 X 8 mu
of bit size(n = 16) where as the result is
aller chunks size of n/2 = 8, for both i
chunks of 8 bits are given as input to 8
w chunks are broken into even smaller
ltiplier block, just as in case of 8 X 8
ivided in half, to get chunks of size 2,
e result produced, from output of 8 X
e sent for addition to an addition tree, a
lock diagram of 16 X 16 Multiply block
d carry out as
1=1,
011110111.
ltiplier blocks.
of 32 bit size.
puts, that is a
X 8 multiplier
chunks of size
ultiply block.
hich are fed to
8 bit multiply
s shown in the
8/10/2019 Muk 2 by 2
13/23
30
Here, as shown in figure 3.14 , the lower 8 bits of q0 directly pass on
to result, while the higher bits are fed for addition into the addition tree.
The adition of partial products is shown in figure 3.14 below.
Figure 3.14 Addition of Partial products in 16 X 16 block
3.4.1 ALGORITHM OF 16X16 VEDIC MULTIPLICATION
Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the
most significant bits(MSB). LSB bits are
A0,A1,A2,A3,A5,A6,A7,B0,B1,B2,B3,B4,B5,B6,B7 and MSB are
A8,A9,A10,A11,A12,A13,A14,A15 and B8,B9,B10,B11,B12,B13,B14,B15.
A= A15A14A13A12A11A10A9A8 A7A6A5A4A3A2A1A0
X1 X0
B= B15B14B13B12B11B10B9B8 B7B6B5B4B3B2B1B0
Y1 Y0
X1X0
* Y1Y0 FEDC
STEP 1:CP1=X0*Y0=C1C0
STEP 2:C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
8/10/2019 Muk 2 by 2
14/23
31
STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1Where CP=cross product
3.5 32 X 32 VEDIC MULTIPLIER
The 32 X 32 Multiplier is made by using 4, 16 X 16 multiplier blocks
as shown in figure 3.15 Here, the multiplicands are of bit size(n=32) where as the
result is of 64 bit size. The input is broken into smaller chunks size of n/2 = 16, for
both inputs, that is a and b.
Figure 3.15 Block diagram of 32X32 Vedic multiplier
8/10/2019 Muk 2 by 2
15/23
32
These newly formed chunks of 16 bits are given as input to 16 X 16
multiplier block, where again these new chunks are broken into even smaller
chunks of size n/4 = 8 and fed to 8 X 8 multiply block, just as in case of 16 X 16
block. Again new chunks are divided in half, to get chunks of size 4, which is thenfed to 4 X 4 multiply block. The result produced, is again fed to 2 x 2 multiplier,
then the resultant bits are sent for addition to an addition tree.
3.5.1 ALGORITHM OF 32X32 VEDIC MULTIPLICATION
Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most
significant bits(MSB) . Both LSB, MSB consists of 16 bits.
A=A31-A16 A15-A0
X1 X0
B= B31-B16 B15-B0
Y1 Y0
X1X0
Y1Y0
FEDC
STEP 1:CP1=X0*Y0=C1C0
STEP 2:C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
STEP 4:D=D0+C1STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
8/10/2019 Muk 2 by 2
16/23
33
3.6 64 X 64 VEDIC MULTIPLIER
The 64 X 64 multiplier is made by using 4, 32 X 32 multiplier
blocks. Here, the multiplicands are of bit size(n=64) where as the result is of 128 bit size. The input is broken into smaller chunks size of n/2 = 32, for both inputs,
that is a and b. These newly formed chunks of 32 bits are given as input to 32 X 32
multiplier block, where again these new chunks are broken into even smaller
chunks of size n/4 = 16 and fed to 16 X 16 multiply block, just as in case of 32 X
32 block. Again new chunks are divided in half, to get chunks of size 8, which is
then fed to 8 X 8 multiply block. The result produced, is again fed to 4 X 4
multiplier, then the resultant bits are fed to 2 X 2 and final resultant bits are sent
for addition to an addition tree, as shown in figure 3.16.
Figure 3.16 64 X 64 VEDIC MULTIPLIER
8/10/2019 Muk 2 by 2
17/23
34
3.6.1.ALGORITHM OF 64X64 VEDIC MULTIPLICATION:
Here X0,Y0 are the least significant bits(LSB) and X1,Y1 are the most
significant bits(MSB) . Both LSB, MSB consists of 32 bits.
A=A63-A32 A31-A0
X1 X0
B= B63-B32 B31-B0
Y1 Y0
X1X0
* Y1Y0
FEDC
STEP 1:CP1=X0*Y0=C1C0
STEP 2:C=C0
STEP 3:CP2=X1*Y0+Y1*X0=D1D0
STEP 4:D=D0+C1
STEP 5:CP3=X1*Y1=E1E0
STEP 6:E=E0+D1
STEP 7:F=E1
Where CP=cross product
3.7 RIPPLE CARRY ADDER
The arrangement of Ripple Carry Adder as shown in figure 3.17 helps
to reduce delay. A simple ripple carry adder is a digital circuit that produces thearithmetic sum of two binary numbers.It can be constructed by a number of full
adders connected in cascade, with a carry output of each adder connected to carry
input of next full adder in chain. Each full adder inputs a cin , which is the cout of
8/10/2019 Muk 2 by 2
18/23
35
the previous adder. This kind of adder is called a ripple carry adder, since each
carry bit ripples to the next full adder.
First full adder may be replaced by a half adder. The layout of a ripple
carry adder is simple , which allows for fast design time.
Figure 3.17 Circuit Diagram of 4 bit Ripple Carry Adder
3.8.MULTIPLY ACCUMULATE UNIT:
Multipliy-accumulate operation is one of the basic arithmetic
operations extensively used in modern digital signal processing(DSP). Most
arithmetic, such as digital filtering, convolution and fast Fourier Transform(FFT),
requires high-performance multiply accumulate operations. The multiply-
accumulator(MAC) unit always lies in the critical path that determines the speed
of the overall hardware systems. Therefore, a high-speed MAC that is capable of
supporting multiple precisions and parallel operations is highly desirable.
8/10/2019 Muk 2 by 2
19/23
3.8.1 BASIC MAC ARCH
Basically a M
and the multiplied output o
initially. The result of additshould be able to produce o
is added to the previous o
below shows basic MAC ar
Here the multi
Urdhva Tiryakbyham Sutra
Figu
3.8.2 MAC UNIT USING
In the MAC u
registers, that is data a_reg
multiplier, which stores the
are continuously fed into
dataout_reg. Here, the MA
36
TECTURE
C unit employs a fast multiplier fitted i
multiplier is fed into a fast adder whic
on is stored in an accumulator register.utput in one clock cycle and the new re
e and stored in the accumulator regist
hitecture.
lier that has been used is a Vedic M
and has been fitted into the MAC design
re 3.18 Basic MAC architecture
EDIC MULTIPLIER
nit, the data inputs A and B are store
and data b_reg. Then the inputs are fe
result in Multiply_reg. The contents o
a conventional adder and the result
unit make use of two clocks, one for t
the data path
is set to zero
he MAC unitult of addition
r. Figure 3.18
ultiplier using
d in two data
into a Vedic
Multiply_reg
s stored in a
e operation of
8/10/2019 Muk 2 by 2
20/23
37
MAC unit and the other one, namely clk2 for the multiplier. The frequency of clk2
should be 4 times the frequency of MAC unit for proper operation. A clock divider
by 4 circuit may be used, in future here, which takes clk2 as the parent clock and
produces clk as the daughter clock, which is 4 times slower than the parent clock, but with 50% duty cycle. The faster clock clk2 is used for the multiplier while
slower clock clk is used for the MAC unit. The data coming as input to MAC
may vary with clock clk.
The signal clr when applied , makes the contents of all the data
registers that is Data a_reg,Data b-reg,multiply_reg and dataout_reg to be forced to
be zero. The clken signal is used to enable the MAC operation. Figure 3.19
shows the architecture of MAC.
Figure 3.19 MAC using Vedic Multiplier
Multiplication Accumulation is an important part of real-time digital signal
processing (DSP) with applications ranging from digital filtering to image processing.
Multiply and accumulate is a very common basic-level operation seen in many DSP
8/10/2019 Muk 2 by 2
21/23
38
designs/algorithms. Two numbers are multiplied together, and added into an
accumulator register. As shown in figure 3.20 and 3.21, the basic MAC unit consists of
multiplier, adder and accumulator.
Figure 3.20 Architecture of Vedic Multiplier
Figure 3.21 Architecture of Booth Multiplier
8/10/2019 Muk 2 by 2
22/23
39
In general MAC unit uses the conventional multiplier unit, which consists
of multiplication of multiplier and multiplicand based on adding the generated partial
products and to compute the final multiplication. This results to adding the partial
products. The key to the proposed MAC unit is to enhance the performance of MACusing Vedic Multiplier and to compare the Vedic, Booth and conventional multiplier in
terms of computation required to generate the partial products and add the generated
partial products to get the final result of the multiplication.
3.9 ADVANTAGES
i)Vedic Multiplier is faster than array multiplier and Booth multiplier. As the
number of bits increases from 4X4 bits to 32x32 bits, the timing delay is greatly
reduced for Vedic multiplier as compared to other multipliers. Vedic Multiplier has
the greatest advantage as compared to other multipliers over gate delays and
regularity of structures.
ii) Power dissipation is very less when compared to booth multipliers.
3.10 APPLICATIONS
i)MAC.
ii)DSP applications(FIR,IIR filters).
3.11 TOOLS USED
3.11.1 SOFTWARE USED
i)Modelsim 6.3 for simulation:
Modelsim is a popular hardware simulation and debug environment
primarily targeted at smaller ASIC and FPGA design. ModelSim provides a
8/10/2019 Muk 2 by 2
23/23
40
complete HDL simulation environment that enables you to verify the functional
and timing models of your design, and your HDL source code. It is optimized for
use with all configurations of Xilinx ISE products.
ii)Xilinx 10.1 for synthesis:
Xilinx ISE is a software tool produced by Xilinx for synthesis and
analysis of HDL designs, which enables the developer to synthesize ("compile")
their designs, perform timing analysis, examine RTL diagrams, simulate a design's
reaction to different stimuli, and configure the target device with the programmer.
3.11.2 HARDWARE USED
FIELD PROGRAMMABLE GATE ARRAY (FPGA)
FPGAs are programmable semiconductor devices that are based
around a matrix of Configurable Logic Blocks (CLBs) connected through
programmable interconnects. As opposed to Application Specific Integrated
Circuits (ASICs), where the device is custom built for the particular design,FPGAs can be programmed to the desired application or functionality
requirements. Although a One-Time Programmable (OTP) FPGAs are available. In
our project we are using Spartan 3 FPGA kit.
SPARTAN 3
The Spartan 3 trainer xc3s400 pq208 is useful to realize and verify
digital designs. User can construct Verilog/VHDL code and verify the results byimplementing physically into the target device (FPGA). With the help of this kit
user can simulate/observe various input and output conditions to verify the
implemented design.