Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
DESIGN AND IMPLEMENTATION OF GATE
DIFFUSION INPUT BASED VEDIC
MULTIPLIER
A THESIS
by
SHOBA M
submitted to Pondicherry University in fulfilment for the award of the
degree of
DOCTOR OF PHILOSOPHY
in
ELECTRONICS AND COMMUNICATION ENGINEERING
DEPARTMENT OF ELECTRONICS ENGINEERING
SCHOOL OF ENGINEERING AND TECHNOLOGY
PONDICHERRY UNIVERSITY
PONDICHERRY - 605 014, INDIA
AUGUST 2016
ii
DEPARTMENT OF ELECTRONICS ENGINEERING SCHOOL OF ENGINEERING AND TECHNOLOGY
PONDICHERRY UNIVERSITY PONDICHERRY- 605 014, INDIA
DECLARATION
I certify to the best of my knowledge that the work reported in this
thesis has not been previously submitted for a degree nor it is based on any
other dissertation which a degree or award was conferred on an earlier
occasion for any other candidate.
I also certify that the thesis has been written by me and any help that
I have received in my research work has been acknowledged. In addition, I
certify that all information sources and literature used are indicated in the
thesis.
Place: Pondicherry (M.SHOBA)
Date :
iii
DEPARTMENT OF ELECTRONICS ENGINEERING SCHOOL OF ENGINEERING AND TECHNOLOGY
PONDICHERRY UNIVERSITY PONDICHERRY- 605 014, INDIA
BONAFIDE CERTIFICATE
It is certified that this thesis titled “DESIGN AND
IMPLEMENTATION OF GATE DIFFUSION INPUT BASED VEDIC
MULTIPLIER” is the bonafide work of Mrs. M. SHOBA who carried out
the research under my supervision. Further certified that to the best of my
knowledge the work reported herein does not form part of any other thesis or
dissertation on the basis of which a degree or award was conferred on an
earlier occasion for this or any other candidate.
Dr. R. NAKKEERAN
(Research Supervisor)
Associate Professor and Head
Dept. of Electronics Engineering
School of Engg. and Technology
Place: Pondicherry Pondicherry University
Date : Pondicherry- 605 014
iv
ABSTRACT
The objective of this research is to design a multiplier which poses
better performance in terms of area, delay and power consumption. The
performance of the multiplier can be greatly influenced by the chosen logic
style. In this work, Gate Diffusion Input (GDI) logic is considered. It is a low
power design technique which can implement any function with low transistor
count. However, this logic has the drawback of producing reduced voltage
swing at their outputs, i.e. the output high (or low) voltage deviated from the
VDD (or GND) by threshold voltage Vt. The existing techniques for obtaining
full swing suffer from having more transistor count or high power
consumption. To overcome this issue, a method is proposed by placing
additional transistor PMOS or NMOS at the output, based on the requirements
of VDD or GND output voltage, respectively. Based on this approach, a set of
full swing GDI gates namely, AND, OR, XOR and XNOR are proposed.
Further, three full adders are designed with the help of these full swing gates.
In addition, a new architecture for 4-2 compressor design is proposed in this
thesis, which is based on simplification of its Boolean output expression. The
partial sharing of architecture between sum and carry output minimizes the
hardware components which in turn reduces the area. Not only that, the
removal of redundant hardware minimizes the spurious switching activities
thus saving power. Also, the performance of the parallel adders namely,
Ripple Carry Adder (RCA), Carry Select Adder (CslA) and Carry Look
Ahead (CLA) adder are improved using proposed gates and adder in GDI
logic.
Further, the implementation of the multiplier based on Vedic
mathematics is discussed. Vedic mathematics is an ancient Indian
mathematics, which has been derived from Vedic sutras. Urdhva
Triyagbhyam (UT) is one of the Vedic sutras, which literally means
v
vertically and crosswise to perform the multiplication operation. During a
multiplication process The existing UT multiplier designs exhibit shorter
delay at the expense of larger area. This issue can be mitigated by dividing
the multiplication into two stages and each stage the computation shall be
performed in parallel. Moreover, the deployment of compressor based partial
product accumulation decreases the delay. This proposed multiplier is
implemented using GDI logic. Finally, a new architecture for the hierarchy
multiplier design is proposed by employing carry select adder and Binary to
Excess 1 Converter (BEC). The use of BEC eliminates the n/4 number of
adders, presented in the conventional hierarchy multiplier where n denotes the
multiplier input bit, thereby improving its speed of operation. The building
blocks of hierarchy multiplier are designed using GDI logic.
The power consumption and delay of all the proposed modules and the
related existing designs are analyzed through SPICE simulation at 45 nm
technology model and their area is calculated from the layout. Further, the
robustness of all the proposed modules with respect to process changes is
validated by Monte Carlo simulation.
vi
ACKNOWLEDGEMENT
The journey of my doctoral studies at Pondicherry University has not
been a painless mission and would never have been possible without the help
and support of several people to whom I want to express my earnest gratitude.
Primarily, I would like to express my profound and sincere gratitude to
my research supervisor, Dr. R. Nakkeeran, Associate Professor and Head,
Department of Electronics Engineering, School of Engineering and
Technology, Pondicherry University, Puducherry for his constant
encouragement, insightful discussions, inspiring words and invaluable
guidance during numerous technical discussions that have found their way
into this dissertation. His wide knowledge and logical way of thinking have
been of great value to overcome the obstacles in my research.
I would also like to thank my doctoral committee members
Dr. P. Sivaprakasam, Associate Professor, Department of Physics,
Pondicherry University, Puducherry and Dr. T. Shanmuganantham,
Assistant Professor, Department of Electronics Engineering, Pondicherry
University, Puducherry for their valuable remarks, recommendation and
suggestion at all stages of my research.
I am extremely grateful to Dr. S. Kanmani, Professor, Department of
Information Technology, Pondicherry Engineering College, Puducherry for
her constant motivation, guidance and moral support during my research
period.
I would like to thank my senior and co-research folks, Dr. J. William,
Dr. M. Thachayani, Mr. M. Ramasamy, Dr. K. Thirumalaivasan,
Dr. R. Ramya, Dr. S. Robinson, Mr. M. Rathinasabapathy,
Dr. A. Rajesh, Mr. G. Idayachandran and Mrs. S. Fouziya Sulthana, Mr.
vii
Enamul Haq Sheik, Finitha Jose and Mrs. Anitha Soman for their
motivation, help and moral support.
It is also my pleasure to express my grateful thanks to
Prof. R. Subramanian, former Dean, School of Engineering and Technology,
and Dr. P. Dhanavanthan, Dean, School of Engineering and Technology,
Pondicherry University, Puducherry facilitating me to pursue the research
work.
I would like to acknowledge the support from University Grants
Commission, Government of India under Junior Research Fellowship
scheme.
I greatly appreciate the support of the Department of Electronics
Engineering office staff members including Mr. N. Gokulan,
Mr. B. Santhanakrishnan, Mr. K. Kaliamoorthy and Mr. N. Soureche.
I am greatly indebted to my parents, sister, husband, in laws and my
baby for their endless love and unconditional support to pursue my interests,
which are vital for the completion of my Ph.D study. I received many help
from unknown hands. A very special thanks to all of them.
Finally, I thank the almighty God for the endless blessings to complete
this work successfully.
M. SHOBA
ix
TABLE OF CONTENTS
CHAPTER NO. TITLE PAGE NO.
ABSTRACT iv
ACKNOWLEDGEMENT vi
LIST OF FIGURES xiii
LIST OF TABLES xv
LIST OF SYMBOLS xvii
LIST OF ABBREVIATIONS xviii
1
INTRODUCTION
1.1 PREAMBLE
1
1.2 OBJECTIVES 2
1.3 LITERATURE SURVEY 3
1.3.1 Logic Styles 3
1.3.2 Full Adder 7
1.3.3 4-2 Compressor 9
1.3.4 Parallel Adders 11
1.3.5 Vedic Multiplier 13
1.3.6 Hierarchy Multiplier 18
1.4 ORGANIZATION OF THE THESIS 19
2
DESIGN OF FULL SWING GATES AND FULL
ADDER USING GDI LOGIC
2.1 INTRODUCTION
21
2.2 GDI LOGIC 22
2.2.1 Design of Gates using GDI Logic 24
x
CHAPTER NO. TITLE PAGE NO.
2.2.2 Full Swing AND, OR, XOR and XNOR
Gates 27
2.3 FULL ADDER DESIGNS 32
2.4 RESULTS AND DISCUSSION 35
2.4.1 Performance Analysis of AND, OR, XOR
and XNOR Gates 35
2.4.2 Performance Analysis of Full Adder 41
2.5 SUMMARY 47
3 AREA AND ENERGY EFFICIENT 4-2 COMPRESSOR
DESIGN USING GDI LOGIC
3.1 INTRODUCTION 49
3.2 RELATED WORKS OF 4-2 COMPRESSOR 50
3.3 METHODOLOGY 53
3.3.1 Proposed 4-2 Compressor 53
3.3.2 GDI Logic 55
3.4 RESULTS AND DISCUSSION 56
3.5 SUMMARY 60
4 PERFORMANCE IMPROVEMENT OF PARALLEL
ADDERS USING GDI LOGIC
4.1 INTRODUCTION
61
4.2 AN OVERVIEW OF PARALLEL ADDERS 62
4.2.1 Ripple Carry Adder 62
4.2.2 Carry Look Ahead Adder 63
4.2.3 Carry Select Adder 63
4.3 RESULTS AND DISCUSSION 64
4.4 SUMMARY 73
xi
CHAPTER NO. TITLE PAGE NO.
5 AREA AND ENERGY EFFICIENT VEDIC
MULTIPLIER IMPLEMENTATION
5.1 INTRODUCTION 74
5.2 AN OVERVIEW OF URDHAVA TRIYAGBHYAM
MULTIPLICATION SCHEME 76
5.2.1 UT Algorithm for Decimal Number System 76
5.2.2 UT Algorithm for Binary Number System 78
5.3 PROPOSED MULTIPLIER 80
5.4 RESULTS AND DISCUSSION 82
5.5 SUMMARY 86
6 HIERACHY MULTIPLIER ARCHITECTURE BASED
ON VEDIC MATHEMATICS AND GDI LOGIC
6.1 INTRODUCTION 88
6.2 AN OVERVIEW OF HIERARCHY MULTIPLIER 89
6.3 METHODOLOGY 91
6.3.1 Proposed Hierarchy Multiplier 91
6.3.2 Base Multiplier 94
6.3.3 Carry Select Adder 96
6.3.4 Binary to Excess 1 Converter 96
6.4 RESULTS AND DISCUSSION 97
6.4.1 Proposed Hierarchy Multiplier 97
6.4.2 Binary to Excess 1 Converter 100
6.5 SUMMARY 102
7 CONCLUSION AND FUTURE WORK
7.1 CONCLUSION 104
7.2 SCOPE FOR FUTURE WORK 108
xiii
FIGURE NO.
LIST OF FIGURES
TITLE
PAGE NO.
2.1
Basic GDI cell
22
2.2 GDI based gates (a) AND (b) OR (c) XOR
and (d) XNOR
24
2.3 Proposed full swing gates using GDI logic
(a) AND (b) OR (c) XOR and (d) XNOR
28
2.4 Schematic of the proposed full adders
based on (a) Design 1 (b) Design 2 and
(c) Design 3
34
2.5 Layout of the proposed AND gate 36
2.6 Layout of the proposed OR gate 37
2.7 Layout of the proposed XOR gate 38
2.8 Layout of the proposed XNOR gate 39
2.9 Layouts of the proposed full adders based
on (a) Design 1 (b) Design 2 and
(c) Design 3
45
3.1 42C (a) Block diagram and (b) Base
architecture
50
3.2 Proposed 42C architecture 54
3.3 GDI logic based (a) XOR and (b) MUX 56
3.4 Layout of the proposed 42C 58
4.1 N bit RCA architecture 62
4.2 Performance comparison of parallel adders
(a) Delay (b) Power consumption (c) Area
and (d) PDP
66
xiv
FIGURE NO. TITLE PAGE NO.
4.3
Layout of 32 bit RCA using proposed
adder
68
4.4 Proposed gates based 32 bit CslA adder
Layout (a) Conventional (Ref. [171])
69
(b) BEC based (Ref. [125]) and
(c) Modified (Ref. [89])
4.5 Layout of 32 bit CLA using proposed gates 70
4.6 Performance analysis of parallel adders 72
under process variation (a) Delay
(b) Power consumption and (c) PDP
5.1 Multiplication of 2x2 decimal number 78
using UT algorithm
5.2 Block diagram representation of the 80
proposed Vedic multiplier
5.3 Internal architecture of the proposed Vedic 81
multiplier (a) First stage and (b) Second
stage
5.4 Layout of the proposed Vedic multiplier 85
6.1 Representation of hierarchy multiplier 90
6.2 Proposed 16 bit hierarchy multiplier 93
6.3 Block diagrammatic representation of base 95
multiplier
6.4 4 bit BEC circuit 96
6.5 Layout of the proposed 16 bit hierarchy 99
multiplier
6.6 Layout of the proposed 8 bit BEC 102
xv
LIST OF TABLES
TABLE NO. TITLE PAGE NO.
2.1 Different logic function realization using GDI 23
cell
2.2 Operational characteristics of gates using GDI 24
logic
2.3 Operational characteristics of the proposed full 29
swing GDI gates
2.4 Performance comparison of the proposed gates 35
with existing designs
2.5 Performance analysis of the gates under 40
process variation
2.6 Performance comparison of the proposed full 42
adders with existing designs
2.7 Performance analysis of the full adders under 46
process variation
3.1 Performance comparison of the proposed 4-2 56
compressor with existing designs
3.2 Performance analysis of 4-2 compressors under 59
process variation
5.1 Performance comparison of 8 bit proposed 83
multiplier with existing designs
5.2 Performance analysis of multipliers under 86
process variation
xvi
TABLE NO. TITLE PAGE NO.
6.1 Performance comparison of the proposed 16 bit
hierarchy multiplier with other multipliers
6.2 Performance analysis of 16 bit hierarchy
multiplier under process variation
98
100
6.3 Performance comparison of 8 bit BEC 101
xvii
LIST OF SYMBOLS
GND - Ground Potential
L - Length of the transistor
VDD - Supply Voltage
Vt - Threshold Voltage
Vtp - Threshold voltage of PMOS transistor
Vtn - Threshold voltage of NMOS transistor
W - Width of the transistor
xviii
LIST OF ABBREVIATIONS
42C - 4-2 Compressor
ALU - Arithmetic and Logic Unit
BEC - Binary to Excess 1 Converter
CLA - Carry Look ahead Adder
CSA - Carry Save Array
CMOS - Complementary Metal Oxide Semiconductor
CPL - Complementary Pass transistor Logic
CslA - Carry select Adder
DRC - Design Rule Check
DSP - Digital Signal Processor
FFT - Fast Fourier Transform
FTL - Feed Through Logic
GDI - Gate Diffusion Input
LVS - Layout Versus Schematic
MAC - Multiply and ACcumulate
McCMOS - Multi channel CMOS
NMOS - N Metal Oxide Semiconductor
PDP - Power Delay Product
PMOS - P Metal Oxide Semiconductor
PTL - Pass Transistor Logic
RCA - Ripple Carry Adder
ROM - Random Only Memory
SPICE - Simulation Program with Integrated Circuit
Emphasis
SRPL - Swing Restored Pass transistor Logic
ULPD - Ultra Low Power Diode
UT - Urdhava Triyagbhyam
VLSI - Very Large Scale Integration
1
CHAPTER 1
INTRODUCTION
1.1 PREAMBLE
An increase in the level of integration in modern Very Large Scale
Integration (VLSI) technology has rendered possible integration of many
complex components in a single chip. Moreover, an analog circuit techniques
in the front end wireless communication demand for a digital domain to save
power. In most of these applications, multipliers have been an obligatory
component and determine overall circuit performance with respect to speed,
power consumption and size. Hence, the goal of this research work is
formulated to design a multiplier with less delay, low power consumption and
compact area.
In general, the performance of multiplier in terms of delay, power
consumption and area can be improved by two methods. First one is based on
efficient implementation of multiplier function, whereas, another relies on
proper selection of logic style for its implementation. There have been various
multiplication methods for realizing the low power and high speed multiplier
introduced in the last few decades. However, in these multiplication
techniques, the intermediate computation involved in the multiplier operation
reduces the speed exponentially in accordance with the width of the multiplier
input bit. This becomes a critical issue for a higher number of input bits. But
this issue can be mitigated by the addition of partial products in parallel,
which is adopted from Vedic mathematics based multiplication. Hence, this
2
work explores possible techniques on an existing Vedic multiplier for the
better performance.
As stated earlier, the logic styles used for realizing the multipliers
have significant influence on the speed, size, power consumption and wiring
complexity. Numerous logic styles in the classes of static Complementary
Metal Oxide Semiconductor (CMOS), dynamic, transmission gate, Pass
Transistor Logic (PTL) and Gate Diffusion Input (GDI) logic are discussed in
the literature. Among them, GDI is considered in this research work due to its
merits of low power consumption and implementation of any functions with
low transistor count. However, the gates based on this logic are suffered from
a low output voltage due to the threshold voltage drop. This has motivated us
to propose an improved set of gates that operate with merits of full swing
without increasing the fabrication complexity with the possibility of
implementing with less transistor count. Based on these gates and adders in
mind, new compressors and parallel adders shall be designed. Further, the
Vedic multiplier shall also be realized with the help of these designs.
In this Chapter, the research objectives and exhaustive literature
survey on logic styles followed by the design of full adder, 4-2 compressor,
parallel adder and multiplier are presented in Section 1.2 and Section 1.3,
respectively. The Chapter concludes with the organization of the thesis in
Section 1.4.
1.2 OBJECTIVES
The objectives of the research work are listed as follows:
To propose the gates namely, AND, OR, XOR and XNOR with
full swing GDI logic and to extend the designed gates for
implementing the full adder designs
3
To propose a simple 4-2 compressor architecture with reduced
delay and area
To improve the performance of parallel adders by implementing
them using aforementioned full swing GDI gates and adder
To propose multiplier architecture with less delay, low power
consumption and small area using the concepts of Vedic
mathematics with full swing GDI logic
To make the multiplier design suitable for multiplying any
inputs whose input width is power of 2 with the help of
hierarchy principle
1.3 LITERATURE SURVEY
An extensive literature survey is carried out in order to confirm the
need for the proposed objective. Initially, the reason for selection of GDI
logic and its bottlenecks are explained, and then the full swing mechanisms
available in the literature for GDI logic are discussed. Further, the earlier
works on arithmetic circuits namely, full adder and 4-2 compressor are
explained. In addition, the existing implementations of parallel adders and the
necessary improvements on their architecture are given. Also, the existing
works relating to Vedic multiplier are discussed. Finally, the existing
hierarchy multiplier architecture and its associated drawbacks are discussed.
1.3.1 Logic Styles
The logic styles used for realizing any digital design has a direct
influence the speed, size, power consumption and wiring complexity.
Different logic styles tend to favor the accomplishment of one performance
aspect at the expense of others. These logic styles are varied in respect to the
method of computing intermediate nodes, the number of transistor count
4
though they are implementing the same function. Numerous logic styles in the
classes of static CMOS, dynamic, transmission gate, GDI logic and Pass
Transistor Logic (PTL) are discussed in the literature.
The logic style reported by (Chandrakasan and Broderson 2003;
Goel et al 2006; Purohit and Margala 2012 and Bahadori et al 2016) is CMOS,
the most common design technique, where each logic network will have pull
up and pull down devices which are controlled by gate input signals. The
merit of CMOS circuit is that the static power dissipation is very small and
produces minimal leakage. However, the power dissipation of a CMOS
device depends on its operating frequency. Whenever the frequency of input
signal increases, the CMOS devices dissipate more power. As the input
capacitances of a CMOS gate get larger, its propagation delay is higher
compared to other logic styles.
PTL circuits implement a logic function as a network of MOS
transistors. They are well suited for pipelined circuits and have enhanced
performance over conventional CMOS circuits in terms of silicon area, speed
and reduced power dissipation (Mohab Anis et al 2002; Nikolaidis et al 2002;
Avci and Yildirim 2003; Shen-Fu Hsiao et al 2010 Nehru et al 2012; Jin-Fa
Lin et al 2012; Deepa and Sampath Kumar 2015 and Yazhini and Rajendiran
2015). However, this logic has the drawback of reduced output voltage drop.
This problem can be overcome by the use of swing restoration buffer at the
output and this logic style is named as Complementary Pass transistor Logic
(CPL).
Usually, a CPL gate (Weste 2003) consists of two NMOS logic
networks (one for each signal rail), two small pull-up PMOS transistors for
swing restoration, and two output inverters for the complementary output
signals. Because the MOS networks are connected to variable gate inputs
rather than constant power lines, only one signal path through each network
5
must be active at a time, in order to avoid shorting different inputs together.
The CPL gates have small input loads and good output driving capability due
to the output inverters and the fast differential stage due to the cross-coupled
PMOS pull-up transistors. This attributes to CPL’s high speed. CPL is mainly
used to implement complex functions (XOR and MUX) which employ smaller
and fewer transistors.
With the absence of the pull-up PMOS transistors, the output
voltage swing of CPL gate is lower than the input swing by the NMOS
threshold voltage, because CPL gate is constructed from NMOS transistors
only. If the CPL output is used to drive an inverter, DC current may flow in
the output inverter because the PMOS transistor of the inverter is not
completely OFF. This is eliminated by adding the pull-up PMOS transistors.
In CPL the Boolean function is evaluated using CPL network and full swing
output is achieved using static CMOS inverter. But the problem incurred with
this configuration is leakage current through static inverters. Furthermore, the
layout of CPL cells is not as straightforward and efficient as CMOS, due to its
irregular transistor arrangements and high density wiring.
As an alternate to CPL, Swing Restored Pass transistor Logic
(SRPL) has been used (Bellaour and Elmasry 1995) which consists of two
parts namely, a complimentary output PTL network and a swing restoring
circuit. The former is constructed with NMOS devices and the latter is
constructed with cross coupled CMOS inverters. The inputs in SRPL
technique are connected to drain and gate of PTL network. Here the pass
variables are connected to the drain of the logic network transistors and the
control variables are connected to the gates of the transistors. This type of
arrangement nullifies the shortfalls associated with PTL and CPL.
Nevertheless, in SRPL when proper device scaling is not provided then
6
discharging the output for ‘1’-‘0’ transition becomes a bottleneck and
consequently, the output degrades.
Another widely used logic style is dynamic, which helps to
implement large number of applications such as high speed digital logic
(Anders et al 2002 and Xu-Guang Sun et al 2002); memory (Amrutur and
Horowitz 2001 and Bhavnagarwala et al 2004) as well as high performance
microprocessor design (Nowka and Galambos 1999). This logic family offers
a number of interesting features compared to static logic, namely reduced
transistor count (almost half compared to static CMOS) as well as reduced
load capacitance and hence improved speed. An operation in a dynamic logic
gate is controlled by a clock signal and can be implemented in either Pull-up
(PMOS) or Pull-down (NMOS) configurations. The voltage at the output of
the dynamic circuit is stored on a parasitic capacitance, which is typically
buffered before it is sent to the next stage. This temporary voltage is affected
not only by charge sharing of the internal parasitic capacitances but also by
the consequent dynamic circuit (Fang Tang et al 2012).
GDI logic has been introduced as an alternative to CMOS logic by
Morgenshtein et al (2002). It is a low power design technique which helps to
realize the logic function with lesser number of transistors. Using this logic
style, design of various arithmetic and logic circuits namely, adder (Lee 2007;
Dan Wang et al 2009; Moradi et al 2009; Shrivas et al 2012; Uma and
Dhavachelvan 2012; Archana and Durga 2014; Dhar 2014; Foroutan et al
2014; Morgenshtein et al 2014; Shinde and Nidagundi 2014 and Soundharya
and Arunkumar 2015), subtractor (Dhar et al 2014 and Singh and Kumar
2014), multiplier (Gupta et al 2013 and Reddy et al 2014), divider (Saberkari
et al 2009), comparator (Khurana et al 2013; Sharma and Sharma 2014 and
Shekhawat et al 2014), Arithmetic and Logic Unit (ALU) (Dubey and Sairam
2014), flip flops (Morgenshtein et al 2004; Fisher et al 2009; Swami et al
7
2011; Abiri et al 2014 and Dhar 2014), memory (Magesh Kannan and
Prathyusha 2015), clock generators (Hari and Mai 2011) etc, are discussed in
the literature.
From the operational characteristics of GDI gates, it is concluded
that they produce reduced output voltage for certain input combinations. The
techniques presented so far to achieve gates output with full swing either by
increasing the number of transistors (more than half from non-full swing
design) or increase the power consumption (use of buffers). So a general
method is required to design full swing at the gate level like AND, OR, XOR,
etc. Hence, an attempt shall be made in this thesis to design gates with merits
of the full swing, small area and less power-delay product.
1.3.2 Full Adder
Full adder is a fundamental block in arithmetic and logic units
which is a nucleus to perform various operations such as subtraction,
multiplication, division and address computation as well as additions. Full
adders are encountered in the critical path of the complex arithmetic
computation like multiplication. Obtaining high operation speed at low
power consumption is desirable which make the design of an adder very
challenging. There are standard implementations from various logic styles
that have been used in the past to design full adder circuit. These are varied
in the way of producing intermediate nodes and outputs and transistor count.
On one hand, a full adder design in static CMOS with pull up
PMOS and pull down NMOS is the conventional design but it requires 28
transistors count Weste et al (2003). On the other hand, dynamic circuits can
significantly reduce the transistor count but the incurred power consumption
is high.
8
Building logic in transmission gate is another alternative to
reduce the complexity. A full adder design using transmission gate plus
inverter consists of 20 transistors is discussed in (Weste et al 2003). To
reduce the transistor count further, PTL is used in lieu of transmission gate.
Despite saving the transistor count, the output level is degraded for certain
input combinations.
There are various full adders designs discussed in the literature
(Shams et al 2002; Hung Ten Bui et al 2002; Jin Fa Lin et al 2012 and
Ramanamurty et al 2012). The full adder design discussed by Shams et al
(2002) uses sixteen transistors and can provide the full swing output. Also an
improved ten transistors full adder design is discussed by Hung Ten Bui et al
(2002), but it is suffered by threshold voltage problem. To overcome this
issue, buffering circuit based PTL full adder is introduced by Jin Fa Lin et al
(2012). A MUX based Shannon full adder using fourteen transistors is
discussed by Ramanamurty et al (2012). Though the design is superior in
energy consumption, this scheme suffers from a setback of low driving
capability.
A power delay comparisons of various full adders, using CMOS,
PTL, GDI, static energy recovery are discussed by Saradindu Panda et al
(2012). They suggested that GDI based full adder which operates on low
power consumption. Added to that, GDI based full adders discussed by (Lee
2007; Moradi et al 2009; Dan Wang et al 2009; Uma and Dhavaselvan 2012;
Shrivas et al 2012; Archana and Durga 2014; Dhar et al 2014; Foroutan et al
2014; Morgenshtein et al 2014; Shinde and Nidagundi 2014 and Soundharya
and Arunkumar 2015) claim that these designs are performing better in terms
of power consumption and area requirement. They also pointed out that the
additional transistors required for achieving full swing output is considered as
a setback. This discussion motivated us to design full adder with merits of low
power, small area and minimum delay. Therefore, in this thesis, the design of
full adders in GDI logic with full swing output without increasing area and
delay has been considered as one of the research objectives.
9
1.3.3 4-2 Compressor
The use of digital 4-2 compressor (hereafter, it is referred as 42C)
was first introduced by Weinberger (1981), since its inception, it has gained
popularity in many digital multiplication and multi-operand addition schemes
(Hsaio et al 1998; Margala and Durdle 1998; Radhakrishnan and Preethy
2000; Prasad and Parhi, 2001; Chua-Chin Wang et al 2002; Ohsang Kwon
et al 2002; Yuan 2007; Subhendu Kumar Sahoo and Chandra Shekhar 2008;
Peiman Aliparast et al 2011; Davoud Bahrepour and Mohammad Javad
Sharifi 2013; Abdoreza Pishvaie et al 2014 and Jamshidi et al 2015). Also,
efficient realizations of signal processing applications with the help of 42Cs
have been recently highlighted (Paim et al 2015 and Schiavon et al 2016).
The simplest representation of 42C consists of a pair of two
cascaded full adder blocks but this configuration lacks in terms of circuitry.
The power efficiency of 42C has been improved by realizing them using
bipolar double pass transistor logic compared with CMOS based design
(Margala and Durdle 1998). Furthermore, the saving in transistor count, delay
and circuit size may be obtained by anatomising into gate levels. They are
implemented by hybrid logic styles to attain better driving capability without
increasing much transistor count is discussed in (Chip-Hong Chang et al 2004
and Veeramachaneni et al 2007).
Various significant works have been reported in the literature for the
better implementation of 4-2 compressors. Conventionally, a 4-2 compressor
is implemented by two cascaded connection of full adder cell, but it suffers
from a longer delay of four XOR gates. To reduce this latency, variant 4-2
compressors are developed with dedicated carry generation circuits
(Nagamatsu et al 1990; Oklobdzija 1995; Hussin et al 2008 and Baran et al
2010).
10
The first dedicated carry generation circuit in 42C design has been
introduced by Nagamatsu et al (1990). and. Claiming that the delay is reduced
significantly, this design uses 3 XORs, 3 ANDs, 3 NORs and 1 inverter for
carry computation. Despite the advantages, the transistor count is higher than
that of the conventional design. This gate count has been reduced into 2
XORs, 1 NAND, 1 NOR, 1 MUX and 1 inverter in the 42C design, discussed
by Oklobdzija (1995). Another method of the carry computation in 42C,
performed by NAND and OR, which is designed by Hussin et al (2008) and
requires 2 XORs, 3 NANDs, 1 OR and 1 inverter, whereas another XOR based
intermediate output computation, involved in previously discussed 42C
designs, has been replaced with the help of NOR and NAND gates discussed
by Pishvaie et al (2013). The drawback of this 42C, it consumes not only
more due to spurious switching activities, but also demands higher transistor
count.
The power consumption of 42C can be minimized by adopting fin
field effect transistor based implementation as discussed by Farid Mosh
Gelani et al (2012) at the cost of fabrication complexity. The advantage of
partial utilization of CMOS full adder and gates while implementing 42C
architecture is discussed by Abdoreza Pishvaie et al (2012). They also
analyzed the performance of 54 bit multiplier using the designed compressor.
Though the design gains advantages in terms of speed and power
consumption, it suffers from increasing the circuit area.
There are significant works carried out in the performance study of
compressors under different logic styles. The actual performance difference
from 42C depends on underlined logic styles that host the implementation of
the basic blocks namely XOR and MUX. Alternate to CMOS, the introduction
of double pass transistor logic based 4-2 compressor by (Shen-Fu Hsiao et al
1998 and Aliprasat et al 2010) reduces the internal load capacitance thus
11
results in decreasing the compressor delay. Also, the hybrid logic selection for
the realization of 42C’s building blocks is discussed to improve its
performance (Chip-Hong Chang et al 2004). A year later, another
performance study of a 4-2 compressor using various logic styles has been
done by Michael Horward et al (2005) and suggest that the PTL based
implementation reduces the transistor count considerably while the power
consumption is minimized in CMOS based realisation.
From the discussion of various cited works, it is well known that the
existing compressor design requires architectural modification so as to reduce
the delay and area. This is addressed in this research work. Also, the
elimination of redundant transistors minimizes the spurious switching
activities thus results in reduced power consumption in the proposed 42C.
Further, the performance of 42C is improved by implementing using GDI
logic based proposed gates.
1.3.4 Parallel Adders
The considered parallel adders in this research work are Ripple
Carry Adder (RCA), Carry Select Adder (CslA) and Carry Look Ahead adder
(CLA). Significant works have been carried out in the implementation of
RCA using various logic styles namely CMOS (Ghobadi et al 2010; Shubin
2010; Shahzad Asif and Mark Vesterbacka 2012 and Amuthavalli and
Gunasundari 2015), PTL (Noor Ain Kamsani et al 2015), GDI (Usha et al
2015), dynamic (Arun and Kumar 2014), Feed Through Logic (FTL)
(Sauvagya Ranjan Sahoo and Kamala Kanta Mahapatra 2012 and Sahoo et al
2012). The design of RCA at the sub- threshold region has been studied by
Vatanjou et al (2015). The improvement in RCA speed has been attained
using FTL based implementation by Sahoo et al (2012) at the cost of more
power consumption. On the other hand, utilization of GDI based full adder
12
in the ripple carry implementation hs been able to reduce the power
consumption as discussed by Usha and Ravi (2015).
Not only that, the reduction in power consumption of RCA using
adiabatic logic is also addressed by Anuar et al (2009) at the expense of
increasing considerable delay. In addition, the performance of RCA under
hybrid logic is analyzed by Archana and Durga (2014). From the literature
survey, it is understood that a standalone CMOS based RCA exhibits more
delay and area whereas a dynamic logic offers better performance but more
power consumption. On the other hand, hybrid logic style performs better but
lack of driving capabilities. Therefore in this thesis, the low power high speed
design of RCA based on proposed full adder using GDI logic will be
attempted.
Though RCA design is simple, its speed is limited by the carry
propagation at every stage. Alternate to this, prior carry computation based
addition method has been proposed and this adder was named as CLA. It
mainly uses propagate (performed by XOR gate) and generate (performed by
AND gate) operations in order to pre compute the carry which makes this
adder requires more gate count which in turn raises the switching activities.
Therefore, this adder has setbacks of an increased area and more power
consumption.
Extensive works have been carried out to reduce the area and power
consumption without depriving CLA performance (Ruiz 1996; Jeong Beom
Kim and Dong Whee Kim 2007; Stefania Perri and Pasquale Corsonello 2012;
Senthil Sivakumar et al 2013; Costas Efstathiou et al 2013; Bairu et al 2014;
Chaitanya kumar and Selva kumar 2014; Lunchao Wang and Ken Choi 2014
and Manas Chanda et al 2015). The existing works suffer from an increase in
delay while decreasing power consumption and also increased area while
13
decreasing the delay. Therefore, in this thesis, these issues shall be overcome
by designing propagate and generate gates of CLA adder using proposed full
swing gates which in turn reduces its power consumption and area without
affecting the performance.
An adder in which the sum outputs are pre computed for presumed
carry inputs 0 and 1, from them the actual sum output is selected after the
arrival of final carry is called as CslA adder, which has been introduced by
Bedrij in 1962. This design uses dual RCA followed by selection circuitry
which requires wider area, consumes more power consumption. There are
various ways of designing CslA adder with a minimum area have been
discussed in the literature (Tyagi 1993; Yong Surk Lee et al 1996; Chang and
Hsiao 1998; Yen-Mou Huang and Kuo 2000; Youngjoon Kim and Lee-Sup
Kim 2001; Neve et al 2004; Chen et al 2010; Ramkumar and Kittur 2012;
Grover and Grover 2013; Mohanty and Patel 2014; Pandey et al 2014; Akhter
et al 2015; Sahu and Shubin 2015 and Saxena 2015).
A single carry select adder exhibits wider area with a lower delay.
Although the hybrid mechanism of CslA and CLA requires less area, it
exhibits increased power consumption. The selection of logic style for the
implementation of CslA adder improves its performance metrics namely, area,
power consumption and delay as discussed by Das et al (2015). Therefore, in
this thesis, an efficient implementation of CslA adder shall be done with the
help of GDI logic based gates and full adder.
1.3.5 Vedic Multiplier
Digital multipliers are the core components of Digital Signal
Processor (DSP) whose speed of operation is mainly determined by the speed
of their multipliers. The multiplication process consists of three stages: partial
product generation, partial product reduction and final carry propagate
14
addition. Numerous amount of research has been so far carried out on
different types of multipliers such as array multiplier (Muhammad et al 1999;
Chong et al 2007; Ravi et al 2011 and Sahoo and Shekhar 2011), Booth
multiplier (Senthilpari 2011; Rao and Dubey 2012; Muralidharan and Chang
2013; Choi et al 2014 and Tsoumanis et al 2016), Wallace multiplier (Waters
and Swatzlander 2010; Gahlan et al 2012; Naveen et al 2013; Mhaidat and
Hamazah 2014; Asif and Kong 2014; Dash et al 2014 and Sudha and
Marimuthu 2014). They have aimed at offering higher speed and lower power
consumption with the minimal usage of silicon area. But to achieve all these
objectives at a design is very difficult. Since, the relationship between speed,
area and power are contradictory.
Lowering supply voltage leads to decrease in power consumption
and slower speed and vice versa. However, some techniques found in the
literature are appropriate for designing high speed multipliers while others for
reducing silicon area. In an array multiplier, multiplication of two input bit
can be achieved through one micro operation using combinational circuit.
However, it requires a large number of gates for the generation of partial
product bits and hence it is economically less trivial. On the other hand, the
common multiplication can be done using shift and add operations resulting in
sequential mechanism, hence, producing a large propagation delay.
In the case of Booth multipliers, numbers of partial products are
reduced through Booth’s encoding. Further, they are added with the help of
parallel adders, but the additional processing time of encoding and decoding
techniques limit the performance of the multiplier. To minimize the number
of partial products further, modified Booth recoding has been proposed in
order to reduce the number of adders. Thereby delay is decreased but the
huge number of pre and post processing steps required for recoding and
decoding mechanism increases the power consumption.
15
A column compression multiplier is popular due to its high speed as
introduced by Wallace in 1964. In this method, the partial products of N rows
are reduced by grouping them into sets of a three-row and two-row set using
(3:2) and (2:2) counters respectively. These counters are placed in the critical
path by Dadda in 1965 to reduce the delay and hence the multiplier is called
Dadda multiplier. An increase in layout complexity due to improper
arrangement of an adder is the drawback in both Dadda and Wallace
multipliers that lead to interconnection issues.
The performance of multiplier can be further improved by an
arrangement of adder such that the sum and carries are generated in a single
step instead of waiting for the arrival of carry from a previous stage. Thus,
carry propagation delay is reduced and the multiplier which employs this
technique is named as Carry Save Array (CSA) multiplier (Zhan Yu et al
2000 and Paul et al 2001). Though the layout is regular, the increase in delay
is caused by an increase in number of input bits prohibiting the use of
multipliers for high speed operation. Thus, most classical multiplication
techniques developed to enhance the performance of multipliers land into
above said associated drawbacks. However, the design of multipliers using
Vedic mathematics can provide a solution to those issues.
Vedic mathematics is an ancient Indian system of mathematics
which is derived from Vedic sutras. It was rediscovered in the early twentieth
century from ancient Indian sculptures. The algorithms based on conventional
mathematics can be easily simplified and even optimized by the use of Vedic
mathematics (Maharaja 2001). These methods and ideas can be directly
applied to arithmetic, trigonometry, plain and spherical geometry, calculus,
hydraulics and applied mathematics of various fields. Urdhva Triyagbhyam
(UT) is one of the sutras, which literally means vertically and crosswise and is
used to perform the multiplication operation.
16
Various interesting methods of realizing multipliers based on UT
method have been introduced in the last decades (Tiwari et al 2008; Mehta
and Gawali 2009; Pushpangadan et al 2009; Pradhan et al 2011; Kunchigi et
al 2012; Zulhelmi Zakaria, and Abbasi 2013; Kumar and Sahoo 2015 and
Jinesh et al 2015). The way of developing bigger modular multiplier from a
smaller one is introduced by Pushpangadam et al (2009) to increase the speed.
A high speed Vedic multiplier using UT method is proposed and its
performance is compared to a modified Booth multiplier. The simulated
results of the aforementioned multiplier show its efficiency on speed and area
usage. The performance of the Vedic multiplier has been analyzed with
conventional multiplication technique by Pradhan et al (2011) and concluded
that Vedic multiplier has an advantage of faster computation.
The Vedic multiplier performance is mainly determined by the
accumulation of partial products. To increase its speed various adders such as
CslA (Naaz 2014; Prasad et al 2014 and Gokhale and Bahirgonde 2015) and
parallel prefix adder (Anjana et al 2014) are incorporated in the architecture
of Vedic multiplier. Further, the performance improvements in this kind of
multiplier using higher order compressors are explained (Huddar et al 2013;
Gupta et al 2014; Abhilash et al 2015; Kaur and Prakash 2015 and Abbasi
et al 2015). A compressor based multiplier reduces the delay at the cost of
increased irregularity in layout. Alternate to this, an efficient bit reduction
binary multiplication using Vedic mathematics is explained by Akhter (2007)
in which the number of input bit reductions is possible at the algorithmic level
to minimize the complexity of multiplication operation.
With the introduction on research over Vedic multiplier in the last
several years the researchers made considerable contribution on the
implementation of higher complex circuits such as Multiply Accumulate Unit
(MAC) (Bhatia et al 2015 and Anitha et al 2015), ALU (Kumar and Raman
17
2010 and Gupta et al 2012), factorial calculation circuit (Saha et al 2011),
Fast Fourier Transform (FFT) (Thakre 2010; Prakash and Kirubaveni 2013;
Naoghare and Sakhare 2015 and Badar and Dandekar 2015), filter (Yagain
and Vijayan 2013), squarer (Sethi and Panda 2012) and cubic (Ramalatha and
Thanushkodi 2009) are explored.
The performance evaluation of FFT processor using conventional
and Vedic algorithms will be specifically explored and compared to (Ronisha
Prakash et al 2013) in this research work. They claim that incorporating UT
Vedic multiplication principle, the delay and power consumption can be
minimized. An interesting implementation of factorial calculation circuit
using Vedic mathematics has been described by Saha et al (2011). The
designed circuit is shown to consume less power and area. The circuit
realization has been carried out using transmission logic. It can be a suitable
candidate for low power and high speed factorial calculations.
A step ahead into a design of Vedic multipliers by accounting power
consumption issues are also addressed in the literature (Kayal et al 2014;
Gupta et al 2012 and Chanda et al 2013). Significant amount of research
works have been published recently on Vedic multiplier implementation using
various logic styles such as reversible (Gupta et al 2012; Saligram and
Rakshith 2013 and Ravali et al 2015) and adiabatic (Chanda et al 2013 and
Sing and Sasamal 2015). Further, the leakage power consumption in the
Vedic multiplier is reduced by the use of Multi channel CMOS (McCMOS)
technique which is discussed by Kayal et al (2014). This multiplier uses UT
sutra for the computation purpose and the transistor level realization is carried
out for comparing power performance metric of conventional and Vedic
mathematics. The results show that Vedic multiplier using McCMOS
technique works well on deep submicron regime.
18
Above discussed proposals found in the literature motivated us to
improve the performance of UT Vedic multiplier both the algorithmic and
transistor levels. These approaches can lead to simplifying the computation
architecture and hence, the delay and area usage are minimal in the proposed
multiplier design. Further, the implementation will be carried out using GDI
logic to decrease the area and power consumption.
1.3.6 Hierarchy Multiplier
Hierarchical multipliers are considered as viable means for
achieving orders of magnitude speed up in computer intensive applications
through the use of fine grained parallelism. They are used in various fields of
numerical and scientific computations, image processing, communication,
cryptographic computation and so on (Quan et al 2005; Jarvinen and Skytta
2008; Shi et al 2011; Zakaria and Abbasi 2013 and Jhamb et al 2016).
Multipliers with large width are required for the implementation in
cryptography and error correction circuits in a more reliable transmission over
highly insecure and/or noisy channels in networking and multimedia
applications. A hierarchical principle helps to realize fast large bit multiplier,
except that it requires a large width adder for performing the addition task,
which poses limitation on the performance and increases area of the designed
multiplier (Chin-Long Wey and Jin-Fu Li 2004; Li et al 2007 and
Gurumurthy and Prahalad 2010).
Over the last few decades, a lot of works have been dedicated, at the
algorithmic and implementation level, to improve the performance of
hierarchical multiplier. The delay in the addition process of the hierarchy
multiplier is reduced with the parallel execution of ripple carry adder.
However, this method requires twice the number of adders thus results in
19
increased area. In addition, the delay is reduced with the deployment of carry
look ahead adder for the addition process at the expense of an increase in
interconnection complexity. Not only delay and area but also the power
consumption of the hierarchy multiplier also has to be reduced because the
existing designs append more zeros to equalize the number of bits in order to
make them suitable for parallel computation. This might increase the spurious
activities and thus increases the power consumption. The above mentioned
issues in the existing hierarchy multiplier can be addressed in this research
work by incorporating binary to excess 1 converter to eliminate number of
adders at the final stage of addition process and performing the final addition
using CslA. Consequently, the multiplier performance namely, power
consumption and area can be reduced by implementing using GDI logic
1.4 ORGANIZATION OF THE THESIS
In Chapter 1, an introduction to GDI logic and Vedic multiplier, the
objective of the research work, literature review pertaining to the design of
gates, full adder, 4-2 compressor, parallel adders, Vedic and hierarchy
multiplier and organization of the thesis are discussed.
In Chapter 2, the design of gates namely, AND, OR, XOR and
XNOR with full swing output using GDI logic will be discussed. Further, the
studies conducted on design of three full adders in GDI logic using the
aforementioned gates with simulated results are presented.
The design of 4-2 compressor and its implementation with
simulation results are described in Chapter 3. The implementation of parallel
adders namely, RCA, CslA and CLA using GDI logic are explained along
with their simulation results in Chapter 4.
20
In Chapter 5, the novel design of Vedic multiplier using 4-2
compressor are detailed and their simulation results are discussed. Further, the
implementation of hierarchy multiplier using the aforementioned Vedic
multiplier along with their simulation results shall be given in Chapter 6.
In Chapter 7, the thesis will be concluded by emphasizing the major
conjecture of the study, summary of the research contribution and the scope
for future studies.
21
CHAPTER 2
DESIGN OF FULL SWING GATES AND FULL ADDER
USING GDI LOGIC
2.1 INTRODUCTION
The circuit realization of low power and low area has become an
important issue due to the increasing demand for mobile electronic devices
such as cellular phones, laptop and so on. The adders and digital gates act as
building components in DSP architectures and microprocessors. Therefore,
their design of them with low power, smaller area and faster speed is in great
demand. Standard implementations with various logic styles have been used
in the past to design gates and full adder cells. The logic styles used in the
design basically influence the speed, size, power consumption and wiring
complexity of the circuit. The GDI logic is considered in this thesis due to its
merits of low power consumption and requirement of less transistor count
than other logic styles, subsequently resulting in smaller area. In this Chapter,
the design of gates namely, AND, OR, XOR and XNOR will be described. In
addition, with the help of these gates three designs of full adder are
implemented with the merits of low power consumption, less delay and small
layout area. The organization of this Chapter is as follows: In Section 2, we
describe the implementation of gates using GDI logic and enumerate its
operational characteristics. Mainly, the proposals for full swing gates are
detailed. Also, with the help of these gates, three full adder designs are
discussed in Section 3. The results and discussion of gates and full adders are
22
detailed in Section 4. A performance study of the proposed gates and full
adder under process changes is also discussed in this Section. Finally, Section
5 summarizes this chapter.
2.2 GDI LOGIC
P
G OUTPUT
N
Figure 2.1 Basic GDI cell
GDI logic is introduced as an alternative to CMOS logic. It is a low
power design technique which offers the implementation of the logic function
with fewer numbers of transistors. The basic GDI cell is shown in Figure 2.1.
Though it resembles a conventional CMOS inverter, the source and drain
diffusion input of both PMOS and NMOS transistor is different. On one hand,
in conventional inverter circuit, source and drain diffusion input of PMOS and
NMOS transistors are always tied at VDD and GND potential, respectively. On
the other hand, the diffusion terminal acts as an external input in the GDI cell.
The realization of various Boolean functions such as F1, F2, OR, AND, MUX
and NOT are listed in Table 2.1.
The main drawback of GDI gate is that it suffers from threshold
voltage drop. This drop reduces current drive and affects the performance of
the gate. The output voltage reduction can be compensated by the use of
swing restoration buffers at the output (Morgenshtein et al 2002). However,
the presence of inverters in the buffers increases the transistor count and also
increases the static power consumption when they are connected in cascade.
23
Table 2.1 Different logic function realization using GDI cell
INPUT OUTPUT FUNCTION
N P G
‘0’ B A
AB F1
B ‘1’ A A B F2
‘1’ B A A+B OR
B ‘0’ A AB AND
C B A AB AC MUX
‘0’ ‘1’ A
A NOT
A multiple Vt technique has been presented in lieu of swing
restoration buffer by Morgenshtein et al (2010). This approach utilizes low
threshold transistors in the places where a voltage drop is to occur and also
high threshold transistors for the inverters. Though this hybrid threshold
voltage method minimizes power consumption, it becomes a bottleneck at the
transistor fabrication process. Also, the design of arithmetic function with full
swing output using F1 and F2 function are highlighted in Morgenshtein et al
(2014). However, it increases twice the transistor count as required in
conventional GDI design.
The techniques presented so far to achieve full swing output either
increase the number of transistors (more than half from non-full swing design)
or increase the power consumption (use of buffers). So, a general method is
required to design the basic gates with full swing output. Hence, an attempt
made in is this thesis to design full swing gates and subsequently three design
for full adder using the proposed gates; a detailed explanation on these efforts
will be discussed in the following sub sections.
24
B B
B
B
B
2.2.1 Design of Gates using GDI Logic
The gates required for realizing any arithmetic function are AND,
OR, XOR and XNOR. These gate functions can be achieved with two
transistors (excluding the inverters for complementary input signals) and their
transistor level diagrams are shown in Figure 2.2.
GND
A A A
AND OR
A XOR
XNOR
B VDD
(a) (b) (c) (d)
Figure 2.2 GDI based gates (a) AND (b) OR (c) XOR and (d) XNOR
The operational characteristics of these gates are given in Table 2.2.
Assume both the inputs have voltage swing, then the output voltages are
subjected to different input combinations as given in Table 2.2.
Table 2.2 Operational characteristics of gates using GDI logic
INPUT LOGIC GATE
A B AND OR XOR XNOR
‘0’ ‘0’ |Vtp| |Vtp| |Vtp| VDD
‘0’ ‘1’ |Vtp| VDD VDD |Vtp|
‘1’ ‘0’ GND VDD-Vtn VDD-Vtn GND
‘1’ ‘1’ VDD-Vtn VDD-Vtn GND VDD-Vtn
25
AND Gate:
The transistor level diagram of the AND gate using GDI logic is
shown in Figure 2.2 (a). The working mechanism of this gate is explained
below:
Logic ‘0’:
For the input combinations AB = 00 and 01, NMOS transistor is
switched OFF and PMOS transistor is switched ON. Therefore, the output is
approximately equal to |Vtp| is obtained at the output, where Vtp is the
threshold voltage of PMOS transistor. However, when AB = 10, the NMOS
transistor becomes ON and PMOS transistor becomes OFF and passes ground
potential (GND) at the output.
Logic ‘1’:
When AB = 11, NMOS transistor is switched ON and PMOS
transistor is switched OFF. Due to its operational characteristics it delivers
poor ‘1’ signal which is about VDD-Vtn at the output, Vtn denotes the threshold
voltage of NMOS transistor.
OR Gate:
The transistor level diagram of the OR gate using GDI logic is
shown in Figure 2.2 (b). The working mechanism of this gate is explained
below:
Logic ‘0’:
When AB = 00, NMOS transistor is switched OFF and PMOS
transistor is switched ON. Therefore, the output approximately equal to |Vtp|
is obtained at the output.
26
Logic ‘1’:
When AB = 01, PMOS transistor is switched ON and NMOS
transistor is switched OFF. Therefore, VDD passes through PMOS transistor.
On the contrary, the case occurs when AB = 10 and 11. In this case NMOS
turns ON and PMOS turns OFF resulting in poor ‘1’ signal in NMOS which is
about VDD-Vtn at the output.
XOR Gate:
The transistor level diagram of the XOR gate using GDI logic is
shown in Figure 2.2 (c). The working mechanism of this gate is explained
below:
Logic ‘0’:
When AB = 00, NMOS transistor is switched OFF and PMOS
transistor is switched ON. Therefore, the output obtained is approximately
equal to |Vtp|. However, when AB = 11, the NMOS transistor becomes ON and
PMOS transistor becomes OFF and passes ground potential (GND) at the
output.
Logic ‘1’:
When AB = 01, PMOS transistor is switched ON and NMOS
transistor is switched OFF. Therefore, VDD passes through PMOS transistor.
On the contrary, the case occurs when AB = 10. In this case NMOS turns ON
and PMOS turns OFF resulting in poor ‘1’ signal in NMOS which is about
VDD-Vtn at the output.
XNOR Gate:
The transistor level diagram of the XNOR gate using GDI logic is
shown in Figure 2.2 (d). The working mechanism of this gate is explained
below:
27
Logic ‘0’:
When AB = 01, NMOS transistor is switched OFF and PMOS
transistor is switched ON. Therefore, the output is approximately equal to|Vtp|.
However, when AB = 10, the NMOS transistor becomes ON and PMOS
transistor becomes OFF and passes ground potential (GND) at the output.
Logic ‘1’:
When AB = 00, PMOS transistor is switched ON and NMOS
transistor is switched OFF. Therefore, VDD passes through PMOS transistor.
On the other hand, when AB = 10, NMOS turns ON and PMOS turns OFF
resulting in poor ‘1’ signal in NMOS which is about VDD-Vtn at the output.
From this discussion, it is concluded that the output voltages are
degraded by threshold voltage drop for certain input combinations. The
reduction in output voltage increases significantly with increase in number of
stages. Therefore, the design of full swing gates is necessary and it is
discussed in the forthcoming subsections.
2.2.2 Full Swing AND, OR, XOR and XNOR Gates
The placement of additional PMOS or NMOS transistor at the
output depends on voltage VDD or GND potential, respectively which mitigates
the non full swing problems existed in the conventional scheme. The
transistor level schematics of the proposed gates are illustrated in Figure 2.3
and brief representations of their operational characteristics are given in Table
2.3.
28
Figure 2.3 Proposed full swing gates using GDI logic (a) AND (b) OR (c)
XOR and (d) XNOR
The operation of proposed gates is explained as follows: The
existing design lacks in full swing operation for particular input combinations.
The techniques presented in the literature directly use supply rail VDD for
strong ‘1’ and GND for strong ‘0’. But the proposed design does not use
supply rails either GND or VDD for obtaining the perfect output. It uses input,
but only with proper biasing of a necessary transistor, which may be either
PMOS or NMOS. This in turn would depend on the input level, to mitigate
the threshold voltage loss, which occurs in conventional design.
B
GND
A
A
AND
P1
N1
N2
B
A
A
OR
P1
N1P2
(a) (b)
B
A
B
A
A
B
XOR
P1
P2N1
N2
B
A
B
A
A
XNOR
P1
P2N1
N2
B
(c) (d)
29
Table 2.3 Operational characteristics of the proposed full swing GDI
gates
INPUT LOGIC GATE
A B AND OR XOR XNOR
0’ ‘0’ GND GND GND VDD
‘0’ ‘1’ GND VDD VDD GND
‘1’ ‘0’ GND VDD VDD GND
‘1’ ‘1’ VDD VDD GND VDD
AND Gate:
The transistor level diagram of the proposed full swing AND gate is
shown in Figure 2.3 (a). The working mechanism of this gate is explained
below:
Logic ‘0’:
For the input combinations AB = 00 and 01, N1 (NMOS) transistor is
switched ON and P1 (PMOS) and N2 (NMOS) transistors are switched OFF.
Therefore, the output node is connected to GND potential through N1.
Likewise for another input condition AB = 10, N1 transistor becomes switched
OFF and P1 (PMOS) and N2 (NMOS) transistors are switched ON. Though
P1 and N2 are ON state, N2 will be responsible for delivering GND potential at
the output.
Logic ‘1’:
When AB = 11, N1 (NMOS) transistor is switched OFF, whereas,
P1 (PMOS) and N2 (NMOS) transistors are switched ON, due to the
operational characteristics of P1 it delivers VDD value at the output.
30
OR Gate:
The transistor level diagram of the proposed full swing OR gate is
shown in Figure 2.3 (b). The working mechanism of this gate is explained
below:
Logic ‘0’:
When AB = 00, transistor P2 and N1 will be switched ON whereas
the drain terminal of N1 is connected to GND potential. It is interesting from
the operational characteristics of NMOS, it is good at delivering strong ‘0’
i.e., GND at the output. Therefore, the non full swing problem occurred in the
conventional GDI gate is eliminated.
Logic ‘1’:
For the input combination AB = 01, the transistors N1 and P2 will be
switched ON and the output terminal is tied to VDD potential through P2
transistor. Likewise when AB = 10 and 11, P1 transistor alone will be
switched ON and the output terminal is charged to the potential of VDD
through the same transistor .
XOR Gate:
The transistor level diagram of the proposed full swing XOR gate is
shown in Figure 2.3 (c). The working mechanism of this gate is explained
below:
31
Logic ‘0’:
When AB = 00, P1 and N2 will be switched ON and other two
transistors namely, P2 and N1 will be switched OFF. The output node is
connected to GND potential through N2 transistor. On the other hand, for the
input combination of AB = 11, N1 transistor becomes switched ON and the
remaining transistor are switched OFF. The output node is tied to GND
potential.
Logic ‘1’:
When AB = 01, the transistors P1 and P2 will be switched ON
whereas N1 and N2 will be switched OFF state. It is well known that PMOS
transistor is good at delivering strong ‘1’ potential (VDD). Likewise, for
another input combination AB =10, the transistor P2 and N1 will be switched
ON and the delivering of VDD potential is taken care by the PMOS transistor
P2.
XNOR Gate:
The transistor level diagram of the proposed full swing XNOR gate
is shown in Figure 2.3 (d). The working mechanism of this gate is explained
below:
Logic ‘0’:
When AB = 01, P1 and N2 transistors are switched ON and passing
GND potential to the output is performed by N2 transistor. Likewise, when AB
= 10, N1 and P2 transistors are switched ON, the source of N1 is connected to
the input B i.e. GND potential. Therefore, the output node is tied at GND
potential.
32
Logic ‘1’:
When AB = 00, the transistor P1 will be switched ON. The output
node is connected to VDD potential through P1 transistor since its drain
terminal is tied to inverted input B i.e. VDD. Another input combination AB =
11 drives the transistor N1, N2 and P2 into ON state. The delivering of VDD
potential to the output terminal will be done by P2 transistor.
2.3 FULL ADDER DESIGNS
The design of GDI full adder with full swing output can be made
possible with the help of full swing gates such as AND, OR, XOR and XNOR
discussed in the previous section. This design completely eliminates the swing
restoration buffers that results in improvement in the performance. Three
possible full swing GDI full adders are designed based on the design’s
expressions [eqs. (2.1) - (2.6)] and their schematic diagrams are given in
Figure 2.4.
Design 1:
The transistor level schematic of full adder using design 1 is shown
in Figure 2.4 (a). The Sum and Cout expressions of this full adder are given in
eqs. (2.1) and (2.2), respectively.
(2.1
(2.2
Design 1 uses XOR output as an intermediate result for computing
Sum and Cout. Sum output can be attained by multiplexing the XOR and its
inverted version XNOR through Cin input. The Cout is obtained by multiplexing
the inputs A and Cin whose output is controlled by the selection input, i.e. XOR
output of A and B inputs. The presence of inverter on the
33
critical path increases the delay of the whole circuit. This design is simple and
requires a total of 18 transistors for realizing the full adder function.
Design 2:
The Sum and Cout expressions of the design 2 are represented in eqs.
(2.3) and (2.4), respectively. This design can be attained by means of XOR,
AND and OR along with multiplexer modules.
(2.
(2.4
Multiplexing the AND and OR operation through carry input Cin
helps in Cout realization. The XOR operation on the inputs A, B and Cin
achieves Sum function. It uses total 22 transistors for implementing Design 2
full adder. The schematic of this full adder is given in Figure 2.4 (b).
Design 3:
This full adder is designed with the help of XOR, AND and OR
gates. and the output expressions of Sum and Cout are given in eqs. (2.5) and
(2.6).
(2.
(2.
In this design, Sum output can be achieved by XORing the inputs A,
B and Cin whereas the output Cout is obtained with the help of AND and OR
followed by XOR gate. The intermediate XOR gate output is used for
computing Sum and Cout outputs. The total transistor requirement of this full
adder is 23. The schematic representation of this full adder is given in Figure
2.4 (c).
34
Figure 2.4 Schematic of the proposed full adders based on (a) Design 1
(b) Design 2 and (c) Design 3
Cin
Cin
B
A
B
A
A
B
Cin
Cin
SUM
A
Cin
Cout
(a)
B
A
B
A
A
B
Cin
Sum
Cin
Cin
B
GND
A
A
B
A
A
C
Cin CoutCin
VDD
(b)
B
A
B
A
A
B
Cin
Sum
Cin
Cin
B
GND
A
A
GND
Cin
Cin
AB CoutAB
(c)
35
2.4 RESULTS AND DISCUSSION
In this thesis, full swing gates are proposed and their performance
shall be compared to the existing works. Further, three GDI full adders are
designed based on those full swing gates and their performances are also
compared to other adders found in the literature in terms of speed of
operation, power consumption and layout area. SPICE simulations are
performed in 45 nm technology with VDD = 1.1V. Typical transistor sizes, i.e.,
(W/L)p=240 nm/45 nm and (W/L)n=120 nm/45 nm are used. After the
completion of simulation of 42C, the layout is generated for each of them and
subjected to Design Rule Check (DRC) then Layout Versus Schematic (LVS)
check before the extraction of parasitic. Subsequently, the extracted parasitic
file is back annotated to perform the post layout simulation.
2.4.1 Performance Analysis of AND, OR, XOR and XNOR Gates
The simulation results of the proposed full swing gates along with
the existing designs are shown in Table 2.4. The performance parameters of
the gates namely, delay and power consumption are calculated from the
simulation. The area is measured from the obtained layout.
Table 2.4 Performance comparison of the proposed gates with existing
designs
Design Delay (ps) Power Consumption
(nW)
Area (µm2)
AND OR XOR XNOR AND OR XOR XNOR AND OR XOR XNOR
Ref. [172] 13.3 11.2 23.2 20 350 295 547 514 3.53 3.7 4.6 7.3
Ref. [93] 7.8 8.8 22 25.2 309 259 403 464 2.9 3.0 4.2 4.1
Proposed
(This
Work)
7.4
4.8
7.5
9.4
277
227
284
339
2.2
2.3
3.2
3.4
36
AND Gate:
The simulation results of the AND gate based on CMOS, GDI and
proposed are given in Table 2.4. The proposed AND gate operates with shorter
delay which is achieved with the help of reduced transistor count in the design.
Due to inherent property of low power consumption of GDI logic, the
proposed gate operates with less power consumption. The power saving
attained in this design compared with CMOS and GDI is 21% and 10%,
respectively. Due to the merit of less number of transistors, the designed gate
consumes 38% and 24% less area than CMOS and GDI based gate
respectively. The layout of the proposed gate is shown in Figure 2.5.
Figure 2.5 Layout of the proposed AND gate
37
OR Gate:
The performances of the proposed OR gate in terms of delay and
power consumption is analyzed through simulation and they are compared
with existing design results. From the results, it is understood that the
proposed design outperforms the existing design. The power saving is
accomplished by the proposed design is 23% and 12% more than CMOS and
GDI based design, respectively. Though GDI logic operates with low power
consumption, the use of buffer increases the power consumption whereas in
CMOS logic the increased switching activities might be responsible for
increased power consumption. While considering layout area, the proposed
design has occupied 38% and 23% less area than CMOS and GDI based
realization of the same design. The layout of the proposed OR gate is shown
in Figure 2.6
Figure 2.6 Layout of the proposed OR gate
38
XOR Gate:
The XOR gate based on GDI and proposed performs better in all
aspects than CMOS based design. The delay improvement in the proposed
XOR gate is 66% more than GDI which is resulted from the elimination of
buffer in the output path. On the other hand, the XOR gate based on CMOS
has large input capacitance which results into the slowdown of the operation.
With respect to power consumption, the proposed XOR gate operates at least
rates since it has no direct path between the power supply and ground rails,
which eliminates direct short circuit current. The power saving possible by the
proposed design is 48% and 30%, respectively more than CMOS and GDI
based implementation of the same. The transistor count is also reduced
compared with the other full swing XOR gates reported in the literature which
in turn reduces the overall layout area. The area minimization in proposed
XOR gate is 30% and 24%, respectively more than CMOS and GDI based
design.
Figure 2.7 Layout of the proposed XOR gate
39
XNOR Gate:
Among the simulated XNOR gate designs, the proposed XNOR
performs better in terms of delay, power consumption and area. The delay
improvement in the proposed XNOR gate is 53% and 62%, respectively more
than GDI and CMOS based realization. Due to the elimination of supply rails
in the circuit, the overall power consumption of the proposed XNOR gate has
been lowered. The proposed XNOR gate consumes 34% less power than
CMOS based design. Likewise, the transistor count is also reduced compared
to the existing designs found in the literature [93]. While considering the
layout area, proposed XNOR gate saves 53% and 17%, respectively more than
CMOS and GDI based implementation.
Figure 2.8 Layout of the proposed XNOR gate
40
Sensitive to Process Variation:
The Monte Carlo simulation has been carried out on the proposed
and existing gates and the mean values of delay and power consumption of
them are tabulated in Table 2.5.
Table 2.5 Performance analysis of the gates under process variation
Design
Delay (ps) Power Consumption (nW)
AND OR XOR XNOR AND OR XOR XNOR
Ref. [172] 14.1 12 25.3 21.2 378 306 567 595
Ref. [93] 8.2 9.1 24.6 28.2 356 271 431 486
Proposed
(This Work)
7.5
4.84
7.57
9.46
280
230
287
343
The proposed gates exhibit lesser variation than the conventional
GDI based design which is resultant of having full swing output thus making
the system stable under process variations too. Though CMOS is able to
operate with full swing, its more susceptible to performance variations due to
higher transistor count. The proposed full swing gates have 1% performance
variation, therefore, they are able to sustain the same performance under
technological improvement also. Hence, the choice of proposed gates as a
basic module in the arithmetic circuit namely, full adder, would gain the
advantage of better performance metrics and can provide good driving
capabilities for the subsequent stages. Hence, the performance analysis of
proposed full adder designs along with existing full adder is investigated in
the forthcoming sub sections.
41
2.4.2 Performance Analysis of Full Adder
Full adders based on CMOS, CPL, hybrid logic and GDI are
compared to the proposed designs. CMOS logic consists of 28 transistors,
which is considered as reference for comparison. It has a full voltage swing
with buffered Sum and Cout signals. CPL, which is a variant of PTL uses 32
transistors and provides both complementary and true output of Sum and Cout
signals. It uses the feedback transistors for providing full swing. A design
which uses a combination of CMOS and PTL to generate Sum and Cout,
respectively is called hybrid design. For all possible input combinations
applicable to the full adder, the average power consumption and worst case
delay are measured. Table 2.6 summarizes the simulation results of single full
adder. The delay is measured by accounting the time taken from 50% of the
input voltage swing to 50% of the output voltage swing on each transition.
The maximum delay is treated as worst case delay.
From the results of Table 2.6, it is very clear that CPL logic
consumes relatively more power due to more number of transistors required
for its design. In the case of hybrid design, this equally performs well with
CMOS in terms of delay and power consumption. However, it takes lower
number of transistor count compared to CMOS for its design, whereas the
three proposed GDI based full adders, especially Design 2 outperforms all the
other adders in both delay and PDP. This would have resulted due to reduced
transistor count on the paths between input and output. This will also lead to
decrease in parasitic capacitance at the Sum and Cout nodes.
42
Table 2.6 Performance comparison of the proposed full adders with
existing designs
Design Delay
(ps)
Power
Consumption (nW)
Area
(µm2)
PDP
(e-18
J) Ref. [117] 46.2 975.6 22.1 45.1
Ref. [45] 38.8 2680 25.0 103.9
Ref. [168] 35.21 1613 18.0 56.8
Ref. [93] 41.3 1310 16.0 54.1
Ref. [164] 49.13 1685 15.6 82.7
Ref. [94] 32.2 1462 18.6 47.1
Design 1 37.86 927.9 10.1 35.1
Design 2 26.87 1140 13.4 30.6
Design 3 36.57 1216 14.6 44.4
The area overhead of the three proposed adders is lower than that of
conventional CMOS, CPL and hybrid adders taken for comparison. The
performance metrics of all the simulated adders such as delay, power
consumption, energy consumption and process variation analysis are
discussed elaborately in this sub sections.
Delay:
The delay results of the simulated adders are given in Table 2.6.
Among all three proposed adder designs, Design 2 has the lowest delay since
Cout and Sum are computed in parallel. Also the improved delay in Design 2
would have been a result of better driving capability of the proposed XOR
gate. The adder design based on Design 2 operates faster by 34.9% 45.3% and
16.5%, respectively better than the adder discussed in [93], [164] and [94].
The presence of inverter in the critical path of Design 1 leads the design to
43
have higher delay among the three proposed full adder. However, the Design
3 in terms of delay stands midway between Design 1 and Design 2 of the
proposed full adder.
The full adder discussed in [93] has longer delay than all other
designs taken for comparison. The low output voltage at internal nodes of full
adder based on XOR in [93] causes less driving capability resulting in longer
delay. Though the design discussed in [164] operates at full swing, the
presence of buffer in the critical path slowed down the operation. The adder
based on F1 and F2 gates in [94] reduced the delay compared to [93] and
[164] at the cost of more transistor count. However, the speed is still lower
than the proposed adder Design 2.
Power Consumption:
The power consumed by the adders are computed through
simulation and also presented in Table 2.6. It reveals that the three proposed
adders consume low power. Among the proposed adders, Design 1 consumes
low power since it adopts the proposed XOR gate and requires minimum
transistor count than the other two proposed designs. Even though their power
consumption is slightly higher than Design 1 they are still lower than other
existing adders except CMOS based adder. The percentage of power savings
attained with Design 1 is higher than adders explained in [93], [164] and [94]
by 29.2, 44.9 and 36.5, respectively.
Area:
The area of the designed and existing full adders is calculated from
their corresponding layout. For an understanding, the layouts of the proposed
three full adders namely, Design 1, Design 2 and Design 3 are given in Figure
2.9 (a), (b) and (c), respectively. Among the three proposed designs, Design 1
reports the smallest area. This saving has been obtained by partial sharing of
44
architecture between Sum and Cout output. Along with that, the removal of
buffers at the gate output results in transistor count reduction and
subsequently layout area too. The area occupied by Design 1 is 54% smaller
than CMOS based implementation as discussed in [117].
Power Delay Product (PDP):
From the simulation results given in Table 2.6, it is observed that
three proposed full adders consume only a small amount of energy (power
delay product) which is possible due to the presence of full swing gates in
those designs. These gates will only switch the required transistor for the
particular input. Hence, they consume less energy. Among the designs taken
for simulation, Design 2 operates on significantly lower energy consumption.
The amount of energy saving can be achieved with Design 2 is 32.1%, 70.5%
and 46.1% more than adder discussed in [117], [45] and [68], respectively.
The adder discussed in [164] provides full swing only at the output
stage owing to the buffering whereas the intermediate nodes suffered by
voltage drop like adder discussed in [93]. Therefore, the energy consumption
of the adder increases significantly. With respect to full adder based on F1
and F2 gates in [94], though it mitigates threshold drop at intermediate nodes,
the overall energy consumption is high due to more transistor count required
for design as shown in Table 2.6. The PDP of Design 2 is better than all other
designs.
45
(a)
(b)
(c)
Figure 2.9 Layouts of the proposed full adders based on (a) Design 1
(b) Design 2 and (c) Design 3
46
Sensitive to Process Variation:
Due to device dimensions miniaturization as technology advances,
process variation analysis of the circuits is necessary. Therefore, Monte Carlo
simulations are carried out, in order to validate that the proposed designs are
more robustness against global and local process variations than the existing
designs. The Monte Carlo simulation results on power and delay distribution
of full adders are given in Table 2.7.
The Monte Carlo simulation results of full adder power distribution
of proposed and existing designs are illustrated in Table 2.7 From the
obtained values, it is observed that the adder discussed in [93] has more
variation in power distribution whereas the full adder as proposed in Design 2
has less variation in power distribution. The decreasing order of sensitive to
process variation among the adders taken from Monte Carlo simulation is
Design 2, Design 3, Design 1, adder discussed in [94], [117],
[45], [164], [168] [93].
Table 2.7 Performance analysis of the full adders under process variation
Design Delay
(ps)
Power Consumption
(nW)
Ref. [117] 56.5 978
Ref. [45] 45.9 2721
Ref. [168] 217.2 1677
Ref. [93] 44.2 1678
Ref. [164] 50.3 1746
Ref. [94] 77.7 2412
Design 1 44.4 930.2
Design 2 27.2 1145
Design 3 41.1 1146
47
The Monte Carlo simulation results for delay distribution of
proposed and the existing full adders are given in Table 2.7. With reference to
performance variation, the decreasing order of delay variation, due to process
changes, among the simulated designs is Design 2, adder based on [164],
[117], Design 1, adder explained in [94], Design 3, adder given in [93], [45]
and [168]. From the values of delay distribution, the full adder based on F1
and F2 gates [94] has higher sensitive to process variation than CMOS based
design. It is observed from the delay distribution results, the full adder based
on [168] has more variation and the Design 2 adder has lower variation. It can
be concluded that Design 2 adder has higher immunity to process variation in
both delay and power distribution.
Three proposed full adder designs have advantages but also some
limitations. Design 1 is an optimal candidate for the applications in which
minimum transistor count and low power are important aspect of design
requirement. The Design 2 provides lower PDP and minimum delay, so it can
be suitable for battery operated and real-time applications. It has slightly
higher transistor count compared to Design 1. Design 3 lies midway between
Design 1 and Design 2, and offers lower delay than Design 1. From the
obtained results, it can be concluded that all three designs operate on low
energy consumption than existing adders taken for comparison. Hence, these
designs can be suitable candidates for realizing energy efficient arithmetic
applications.
2.5 SUMMARY
In this chapter, the digital gates namely, AND, OR, XOR and XNOR
are designed with low delay, low power consumption and small area with the
help of full swing GDI logic. Based on these gates, three full adder designs
that use as few as twenty transistors per bit are proposed. The design adopts
proposed full swing gates to alleviate the threshold voltage problem and to
48
enhance the driving capability for cascaded operation. The enhanced driving
capability also facilitates lower voltage and faster operation which leads to
lower energy consumption. The proposed designs along with existing adder
circuits are simulated using the SPICE simulation tool at 45 nm technology.
The comparison is done in terms of power consumption, propagation delay,
area and PDP. The proposed three designs have lower energy consumption
when compared to other designs presented in the literature. The process
variation analysis of circuits is studied through Monte Carlo simulation. From
the Monte Carlo simulation results, it is found that proposed adder based on
Design 2 can operate reliably and has higher tolerance against process
variation than the previously reported adder in the literature. Hence, these
proposed designs may be suitable for low energy and high speed VLSI circuit
applications.
49
CHAPTER 3
AREA AND ENERGY EFFICIENT 4-2 COMPRESSOR
DESIGN USING GDI LOGIC
3.1 INTRODUCTION
A fast multiplier is an essential component in any high performance
system. Compressors are building blocks of fast tree multiplier. Various
designs of compressors such as 42C, 5-2 and 7-3 have been introduced to
improve a multiplier speed. Among them, 42C is used in the multiplier partial
products reduction phase due to its regular structure. From the study of
various compressors, it is understood that 42C has better compression ratio
and can be considered as a replacement for carry save adder, which is
traditionally used in partial products reduction stage. Furthermore, the regular
structure of 42C decreases the interconnection complexity in the existing
Wallace and Dadda multipliers.
A straightforward realization of 42C uses two cascaded full adder
and has 4 gates delay. To address this issue, dedicated carry generation circuit
has been introduced in 42Cs and their architectures are well explored.
However, these 42Cs architectures exhibit hardware redundancy. Moreover,
the power consumption of redundant gates is not negligible which increases
the overall power consumption of 42C. This problem can be addressed by the
removal of redundant gates, which is accomplished by simplification of
compressor output Boolean expression without affecting its functionality. The
spurious switching activities, contributed by the redundant gates, are eliminate
50
in the proposed 42C thus resulting in power consumption minimization.
Further, the new design shall be proposed and implemented using GDI logic
in this thesis. The rest of the Chapter is organized as follows: Section 2
overviews on the existing 42C designs whereas in Section 3 will propose 42C
and its implementation using GDI logic. Further, the simulation results and
discussion of the 42C are given in Section 4 and finally, the summary is
drawn in Section 5.
3.2 RELATED WORKS OF 4-2 COMPRESSOR
Owing to its regular interconnection, 42C plays an important role in
a multiplier design. It receives x1, x2, x3, x4 and ci, five input bits of the same
weight, compresses them, and generates three output bits namely, s, co and c.
The output carry co is generated based on three inputs x1, x2 and x3 thus in the
results there are no horizontal carry propagation across the compressor. The
block diagram and the base architecture representation of 42C are shown in
Figures 3.1 (a) and 3.1 (b), respectively. The fundamental equation governs
the 42C operation can be reproduced as follows:
1 2 4 i 2 ( .1
4-2 Compressor
x1 x3 x2 x4 ci
sc
co
FA
x1 x2 x3
FA
x4 ci
co
c s
(a) (b)
Figure 3.1 42C (a) Block diagram and (b) Base architecture
51
The 42C functionality can be described in the following eqs. (3.2) - (3.4).
1 2 4 i (3.2)
co 1 2 ( 1 2 1 (3.3)
(3.4)
Conventionally, the 42C is implemented by two cascaded
connection of full adder cell, but it has a longer delay of 4 XOR gates. To
reduce this latency, variant 42Cs are developed with dedicated carry
generation circuits. The first dedicated carry generation circuit in 42C design
is introduced by Nagamatsu et al (1990). This design uses 3 XORs, 3 ANDs, 3
NORs and 1 inverter for carry computation and claims that delay reduced
significantly. Despite the advantages, the transistor count is more than that of
the conventional design. This gate count is reduced into 2XORs, 1NAND,
1NOR, 1MUX and 1 inverter in the 42C design, discussed by Oklobdzija
(1999).
Another method of the realization of carry computation in 42C,
performed by NAND and OR, which is designed by Hussin et al (2008), and
requires 2 XORs, 3 NANDs, 1 OR and 1 inverter. The XOR based intermediate
output computation, involved in previously discussed 42C designs, is replaced
with the help of NOR and NAND gates, discussed by Pishvaie et al (2013).
The drawback of this 42C is not only more power consumption due to
spurious switching activities and also more transistor count. The advantage of
partial utilization of CMOS full adder along with the gates while
implementing 42C architecture is discussed by Pishvaie et al (2012). They
also analyzed the performance of 54 bit multiplier using the designed
compressor. Though the design gains advantages in terms of speed and power
consumption it is suffered by increased area.
52
From the discussion on 42C, it is understood that the conventional
designs exhibit hardware redundancy due to the usage of separate circuits
while computing 42C sum and carry outputs. Moreover, this redundant
hardware increases transistor count and power consumption. This problem
can be addressed by simplifying compressor output Boolean expression
without affecting its functionality. From the truth table of the 42C, it is
observed that the carry output is same as carry input, if XOR output of ci and
x4 is low, otherwise, it follows the x1 x2 x3 output, where x1, x2, x3, x4
and ci are 42C inputs. This feature will be exploited in the proposed 42C,
which helps to use the partial sum output for carry computation. This
eliminates hardware duplication and thus reduces overall transistor count. In
addition to that, the elimination of unnecessary circuits, which might be a
reason for spurious switching activities, would result in reduction of total
power consumption of proposed 42C.
To implement the building blocks of new 42C, GDI logic is chosen.
This logic helps to implement proposed architecture with merits of low power
consumption and lower transistor count compared to other logic styles
namely, CMOS, PTL and transmission, which are used in the existing 42C
designs. Moreover, the existing design prefers either PTL/transmission, due to
its lesser transistor count, but their operational characteristics i.e. weak
driving ability is considered as a drawback. On whole, the proposed
compressor facilitates advantage in both simple architecture and
implementation (mitigates the weak driving problem which is encountered in
the existing 42C designs) level. The discussion on the proposed 42C and the
implementation using full swing GDI logic is explained in the forthcoming
Section.
53
3.3 METHODOLOGY
This Section discusses the architecture and operation of proposed
42C followed by its implementation using full swing GDI logic.
3.3.1 Proposed 4-2 Compressor
The hardware duplication found in the carry computation techniques
of conventional 42C designs is considered as a drawback at both architecture
and implementation level. This can be reduced by sharing the partial output of
sum computation into carry output also. In general, the sum output is obtained
by XOR operation of x1, x2, x3, x4 and ci, in serial. But in the proposed 42C, it
is portioned into two stages. In one stage it performs XOR operation of x1, x2
and x3 where as in another stage, the operation performs over x4 and ci, where
x1, x2, x3, x4 and ci are input bits. The first and second stage outputs are labeled
as M and N, respectively, which are given in eqs. (3.5) and (3.6). Also, it is
noted that both the stage computations are performed in a parallel manner.
Among the two intermediate M and N outputs, N output acts as a
select input for carry computation. If the select input (N) is zero, and then
carry output is same as carry input, otherwise, the carry output follows the
value of M. Further, the XOR operation of M and N will result into
compressor sum output. The sum and carry outputs are represented as s and c,
respectively, whereas co is denoted as a horizontal carry and it is computed
from multiplexing the inputs, either x1/x3 depending on the XOR output of x1
and x2.The proposed 42C’s outputs are expressed in the following eqs.(3.5) - (3.9).
54
1 2 ( . i 4 ( . ( . ( 1 2 ( 1 2
1 ( .
( .
The architecture of the proposed 42C is shown in Figure 3.2.
XOR XOR
MUX XOR
MUX XOR
c
x1 x2 x4
s
cix3
co
ci
M
N
Figure 3.2 Proposed 42C architecture
The implementation detail of the proposed 42C is explained in the following
subsection.
55
3.3.2 GDI Logic
The performance of 42C is influenced by the performance of their
basic modules such as XOR and MUX. The implementations of XOR and
MUX using various logic styles namely, CMOS, PTL and transmission are
well explored in literature. In Abidi et al (2012), a study of performance
comparison of 42C with various logic styles is discussed and concluded that
each implementation performs well in one aspect while compromising other
aspects. The CMOS based implementation of 42C, discussed by Srinivas et al
(2007), has a good driving capability, but the need for more transistor count is
considered as a limitation.
A method of implementing 42C, which has sufficient driving
capability, with reduced transistor count, without increasing interconnection
complexity, is made possible with the help of GDI logic. The design of gates
with the full swing output using GDI logic is obtained through the placement
of additional PMOS or NMOS transistor at the output terminal depends on the
voltage degradation i.e. (VDD-Vt or Vtp). Based on this technique, a set of gates
and adders in GDI logic with full swing are designed and are well explored in
the previous Chapter. From their simulation results, it is understood that these
components exhibit better performance in terms of delay, power consumption
and area. Therefore, they can be used for realizing the proposed 42C to
improve the performance. The diagram of XOR and MUX using GDI logic is
shown in Figures 3.3 (a) and 3.3 (b), respectively.
56
Figure 3.3 GDI logic based (a) XOR and (b) MUX
3.4 RESULTS AND DISCUSSION
In this Section, the simulation results of the proposed and the
existing 42Cs are presented and their performance in terms of delay, power
consumption and layout area is compared. SPICE simulations have been
performed at 45 nm technology with a supply voltage (VDD) of 1.1 V. Typical
transistor sizes, i.e., (W/L)p=240 nm/45 nm and (W/L)n=120 nm/45 nm are
used. After the completion of simulation of 42C, the layouts have been
generated for each of them and are subjected to DRC then LVS check before
the extraction of parasitic. Subsequently, the extracted parasitic file is back
annotated to perform the post layout simulation. The simulation results of 42C
are given in Table 3.1.
Table 3.1 Performance comparison of the proposed 4-2 compressor
with existing designs
S. No. Design Delay
(ps)
Power Consumption
(µW)
Area
(µm2)
PDP
(e-18 J)
1 Ref. [112] 126 6.7 56 844
2 Ref. [107] 175 8.3 55 1452
3 Ref. [65] 137 6.9 58 945
4 Proposed
(This Work) 114 4.4 51 502
B
A
B
A
A
B
XOR
B
A
B
A
MUX
(a) (b)
57
Delay:
The delay is measured by accounting the time from the 50% of the
input voltage swing to 50% of the output voltage swing for each transition.
The maximum delay is treated as worst case delay. The delay computed
through simulation, for all the 42C structures are given in Table 3.1. As it is
expected, proposed 42C has smaller delay compared to those other existing
implementations. This is achieved due to parallel computation of intermediate
outputs. On the other hand, the design discussed in [107] has the highest
delay, due to the requirement of complementary signal imposed by this
compressor architecture. The speed improvement obtained by the proposed
42C is 35%, 17% and 10% more than that of 42Cs discussed in [107], [65]
and [112], respectively.
Power Consumption:
While designing any system, the minimization of power
consumption is given prime importance. In general, the circuit’s power
consumption is determined from their switching activities and node and wire
capacitances. The power consumed by the 42Cs are computed through
simulation and also presented in Table 3.1. The results indicate that the
architecture discussed in [107] and [65] have more power consumption than
that of design in [112] and proposed 42C. The minimum power consumption
is witnessed in proposed 42C owing to its simple and regular structure,
whereas the architecture in [107] consumes more power due to its dense
wiring tracks.
Area:
The layout is drawn for all the existing and proposed 42Cs. The area
is evaluated from their layout and it is given in Table 3.1. From the obtained
58
results, it is witnessed that proposed 42C has less area, whereas more area
belongs to the 42C discussed in [107]. As stated earlier, in Section 3, GDI
logic implements XOR and MUX with reduced transistor count. Therefore, the
area of the proposed 42C is lesser. The layout of the proposed 42C is shown
in Figure 3.4. The percentage of area reduction possible with proposed 42C is
about 9% more than that of a recently reported compressor in [112].
Figure 3.4 Layout of the proposed 42C
59
PDP:
The power delay product of the proposed and existing 42C designs
are given in Table 3.1. Among the compressors discussed, the best and the
worst PDP belong to 42C of proposed and the design discussed in [107],
respectively. The energy saving accomplished with proposed design is 41%
more than the compressor reported in [112]. It is examined from the obtained
results of PDP of 42Cs, the proposed design implemented with GDI logic, has
small PDP with acceptable speed and hence, it can be a proper choice while
performing partial products accumulation in the multiplier.
Sensitivity to Process Variation:
In order to evaluate the sensitivity of the designs to local and global
process variations, Monte Carlo simulations have been carried out and the
results are tabulated in Table 3.2.
Table 3.2 Performance analysis 4-2 compressors under process variation
S. No. Design Delay
(ps)
Power
Consumption (nW)
PDP
(e-18 J)
1 Ref. [112] 129 6855 0884
2 Ref. [107] 184 8506 1565
3 Ref. [65] 137 6811 0933
4 Proposed
(This Work)
115 4410 0507
60
As expected, proposed compressor design has better immunity to
process variation. Moreover, the design based on pass transistor gate,
discussed in [107] is more sensitive due to its driving current dependence on
process sensitive Vt, which is amplified due to voltage drops at internal nodes.
The PDP variation of the proposed design is 1%, whereas the existing
compressor explained in [112] shows about nearly 5%.
3.5 SUMMARY
In this Chapter, a new approach for the design of 4-2 compressor,
which is based on the modification of the existing compressor carry output
implementation without affecting its functionality, is presented. This
technique utilizes the partial output generated during sum computation, used
for carry output. The carry output is same as carry input if XOR operation of ci
and x4 is low otherwise, it follows the XOR output of x1, x2 and x3 input. To
accomplish this, the design divides the computation of sum into two stages
and it is allowed to perform the computation in parallel. The part of the sum
output acts as a select input while implementing carry output. This
modification eliminates hardware redundancy, which is exhibited in the
existing designs, to minimize the transistor count. Moreover, the spurious
transitions from the duplicate gates are avoided, which minimizes the overall
power consumption of the proposed 4-2 compressor significantly. Further, the
performance of 4-2 compressor is improved by proper implementation of
building blocks namely, XOR and MUX. The proposed and the existing 4-2
compressor designs are simulated using 45 nm technology model. The
comparison is done in terms of delay, power consumption, area and PDP. The
proposed design has shown 41% more improvement in PDP compared with
existing compressor reported in the literature. Hence, this area and energy
efficient compressor would be used as one of the building modules for the
implementation of multiplier in signal processing applications.
61
CHAPTER 4
PERFORMANCE IMPROVEMENT OF PARALLEL
ADDERS USING GDI LOGIC
4.1 INTRODUCTION
While the growth of electronics market has driven the VLSI
industry towards very high integration density and system on chip, critical
concerns have been arising on a severe increase in power consumption and
area. High power consumption raises temperature profile of the chip and
affects overall performance of the system. Moreover, the explosive growth in
laptops and portable personal communication systems demand long battery
life at the modest performance. This necessitates an intensive research in low
power and low area integrated circuit design.
Parallel adders are developed to minimize the delay involved in the
binary addition task and are well suited for VLSI implementation. The
performance of these adders can be greatly influenced by the performance of
their basic modules. In this chapter, an efficient implementation of parallel
adders using GDI logic is discussed. The parallel adders under consideration
are, ripple carry, carry select and carry look ahead adders. The basic modules
of these parallel adders are full adder (for (RCA)), XOR and AND gate (for
(CLA)), full adder and MUX (for (CslA)). Therefore, these basic modules are
realized using GDI logic. The organization of the Chapter is as follows:
Section 2 gives an overview of the parallel adders and its implementation
using GDI
62
logic. In Section 3, their simulation results and discussion are given and the
Section 4 summarizes this Chapter.
4.2 AN OVERVIEW OF PARALLEL ADDERS
A brief description of the parallel adders is given in the following
sub section.
4.2.1 Ripple Carry Adder
The RCA is O (n) time and O (n) area adders, where, n is the width
of the operands. General n bit RCA architecture is shown in Figure 4.1. In the
worst case, a carry can propagate from least significant bit position to the
most significant bit position. Moreover, one stage of the RCA, the single full
adder, determines the performance of RCA. Therefore, the delay of RCA can
be decreased by implementing using fast full adder. In order to achieve this
performance, a full adder based on GDI logic is chosen in this research work.
Further, the carry propagation delay can be reduced by minimizing carry
propagation path or by performing pre computation of carries.
FA FA FA FA
Critical Path
N-bit RCAFull Adder
S3
S2
S1
SN
A1
B1
A2
B2
A3
B3
AN
BN
Co
Ci
C1
C2
C3
C4
CN
Figure 4.1 N bit RCA architecture
63
4.2.2 Carry Look Ahead Adder
CLAs have become popular due to their high speed and modularity.
They are O (log n) time and O (n log n) area adders. Consider the n-bit
addition of two n- bit numbers A= an-1, an-2, an-3, .., a0 and B = bn-1, bn-2, bn-3…,
b0 resulting in the output sum S = Sn-1, Sn-2,.., S0 and carry out Cout.
The first stage in CLA computes the bit generate (Gi) and propagate
(Pi) as follows
= (4.1)
(4.2)
These are then utilized to compute the final sum (Si) and carry (Ci+1) bits.
(4.
(4.4)
Where 0 ≤ ≤ -1
An overall delay of carry look ahead adders is dominated by the
delay of passing the carry in look ahead stages. From the CLA architecture, it
is understood that its building blocks are XOR and AND gates. Moreover, the
CLA performance is determined from these basic gates performance.
Therefore, the performance improvement in CLA is possible by implementing
its building blocks using GDI logic.
4.2.3 Carry Select Adder
To minimize the delay due to carry propagation involved in RCA,
CslA is evolved, in which, two additions are performed in parallel, one
assuming Cin as 0 and other one as 1. When the carry is known, finally the
correct sum is selected. They are O (2n) area and O (√ ) time adders. CslA
64
has been used in many computational systems to alleviate the problem of
carry propagation delay by independently generating multiple carries and then
by selecting a final carry to generate the sum. However, CslA is not area
efficient because it uses multiple pairs of RCA to generate intermediate sum
and carry for Cin= 0 and Cin=1.
The different techniques for minimizing the use of dual RCA in
CslA have been attempted by (Ramkumar and Kittur 2012 and Mohanty and
Patel 2014). An interesting approach discussed by Ramkumar and Kittur
(2012) is the use of Binary to Excess 1 Converter (BEC) instead of RCA for
Cin=1. The BEC based CslA involves less logic resources than the
conventional CslA. Also, the area reduction is possible in CslA with the
technique of sharing common Boolean logic expression for Cin 0 and 1
(Youngjoom Kim and Lee-Sup Kim 2001). Though this technique requires
less logic resources than the BEC based CslA, the carry propagation delay
generated is longer. Further, CslA design is simplified based on logic
reformulation and optimization of carry generator module which is explained
by Mohanty and Patel (2014). This design possesses smaller area and delay
than the conventional CslA design. However, still the performance of CslA
design can be improved by proper implementation of their basic modules such
as MUX and full adder. Therefore, the CslA is implemented based on the
proposed designs as discussed in Chapter 2 of this thesis.
4.3 RESULTS AND DISCUSSION
In this Section, the simulation results of the parallel adders based on
CMOS, GDI and proposed are presented and their performance will be
compared. During the evaluation of these adders, the performance metrics
such as area, delay, power consumption and PDP are taken into account.
SPICE simulations are performed at 45 nm technology with a supply voltage
(VDD) of 1.1 V. Typical transistor sizes, i.e., (W/L)n=120 nm/45 nm and
(W/L)p=240 nm/45 nm are used. After the completion of simulation of
parallel adders, the layout is generated for each of them and is subjected to
65
DRC and then LVS check before the extraction of parasitic. Subsequently, the
extracted parasitic file is back annotated to perform the post layout simulation.
Delay:
The delay is measured by accounting the time from the 50% of the
input voltage swing to 50% of the output voltage swing on each transition.
The maximum delay is treated as worst case delay. The delay is computed
through simulation for all the adder structures are given in Figure 4.2 (a). As
it is expected, CLA structures have smaller delay compared to those other
four adders due to the parallel computation of their carries. On the other hand,
RCA has the highest delay due to its serial structure. However, RCA
implemented based on proposed adder, discussed in the Chapter 2 of this
thesis, has shown 12% and 6% speed improvement than CMOS and GDI
adders, respectively. The critical path delay of CslA is smaller than that of
RCA due to the skipping of carry propagation. The implementation of CslA
discussed in [172], [125] and [89] through the use of proposed gates achieves
delay reduction of 15%, 27% and 20% more than CMOS based
implementation of those adders.
Power Consumption:
Power is one of the vital sources hence a major attention is paid to
minimize the power consumption while designing the system. It mainly depends
on the switching activities and node and wire capacitances. The power
consumed by the parallel adders are computed through simulation and also
given in Figure 4.2 (b). The results indicate that the CLA and CslA have higher
power consumption than that of RCA. The minimum power consumption is
witnessed in RCA owing to its simple and regular structure while CLA
consumes more power due to its dense wiring tracks. However, the power
consumption of the CLA based on proposed gates is reduced by 30% than
CMOS based design.
66
Figure 4.2 Performance comparison of parallel adders (a) Delay (b)
Power Consumption (c) Area and (d) PDP
(a)
(b)
(c)
(d)
0
200
400
600
800
1000
1200
CMOS (Ref. [172]) GDI (Ref. [94]) Proposed (This Work)
De
lay
(ps)
RCA (Ref. [27])
Conventional CslA (Ref. [172]) BEC CslA (Ref. [125])
Modified CslA (Ref. [89])
CLA (Ref. [94])
0
100
200
300
400
CMOS (Ref. [172]) GDI (Ref. [94]) Proposed (This Work)
Po
we
r C
on
sum
pti
on
(µ
W) RCA (Ref. [27])
Conventional CslA (Ref. [172])
BEC CslA (Ref. [125])
Modified CslA (Ref. [89])
CLA (Ref. [94])
0
500
1000
1500
2000
2500
3000
CMOS (Ref. [172]) GDI (Ref. [94]) Proposed (This Work)
Are
a (µ
m2 )
RCA (Ref. [27])
Conventional CslA (Ref. [172])
BEC CslA (Ref. [125])
Modified CslA (Ref. [89])
CLA (Ref. [94])
0
20
40
60
80
100
120
CMOS (Ref. [172]) GDI (Ref. [94]) Proposed (This Work)
PD
P (
e -
15
J)
RCA (Ref. [27])
Conventional CslA (Ref. [172])
BEC CslA (Ref. [125])
Modified CslA (Ref. [89])
CLA (Ref. [94])
67
Area:
The layout is drawn for all these implemented adders. The area is
evaluated from their layout and it is depicted in Figure 4.2 (c). From the
obtained results, it is witnessed that proposed RCA has smaller area
whereas larger area belongs to CMOS based CLA adder. Since the single full
adder realized using proposed design has less area than either CMOS or GDI
logic (discussed in Chapter 2), which might be a reason that overall area of
RCA becomes lesser. The layout of RCA using proposed full adder is given
in Figure 4.3. The area saving possible with the proposed design is 53% and
33% more than that of CMOS and GDI based design. Likewise, the realization
of CslA design discussed in [89] using the proposed gates and adder saves 39%
more area compared that of GDI based realization. It is noted that area saving
attained in the CslA discussed in [89] is more than other designs under
consideration. Since the proposed gates eliminate the redundant transistors
presented in the existing designs, therefore, the area is reduced considerably.
The area saving possible with the conventional CslA using proposed gates and
adder is 49% and 31% than CMOS and GDI based implementation. The
corresponding layouts are given in Figure 4.4. Likewise, in CLA adder, the
percentage of area reduction possible with the help of proposed gates is 17 and
13, respectively more compared with CMOS and GDI logic. The layout of CLA
using proposed gates is shown in Figure 4.5.
69
(a) (b) (c)
Figure 4.4 Proposed gates based 32 bit CslA adder Layout (a)
Conventional (Ref. [172]) (b) BEC based (Ref. [125]) and (c)
Modified (Ref. [89])
71
PDP:
The power delay product of the parallel adders using CMOS, GDI
and proposed is given in Figure 4.2 (d). Among the adders discussed, the best
and the worst PDP belongs to proposed gates based modified CslA [89] and
CMOS based conventional CslA, respectively. However, the PDP of
conventional CslA is reduced with the help of proposed gates by 45% and 43%
more than CMOS and GDI, respectively. Similarly, proposed gates and adder
based CLA and RCA operated with lesser PDP by 40% and 21%, respectively
than CMOS based realization of same designs. Also, it is examined from the
obtained results of PDP of parallel adders that CslA implemented using
proposed gates has small PDP with acceptable speed and hence, they can be a
proper choice while designing high performance and low power applications.
Sensitive to Process Variation:
In order to evaluate the sensitivity of the designs to local and global
process variations Monte Carlo simulations have been carried out for parallel
adders. The variations in power consumption, delay and PDP with respect to
the process variations are depicted in Figure 4.6. As expected, the proposed
parallel adders have better immunity to process variation compared with
others.
72
Figure 4.6 Performance analysis of parallel adders under process
variation (a) Delay (b) Power Consumption and (c) PDP
(a)
(b)
(c)
0
200
400
600
800
1000
1200
CMOS (Ref. [172]) GDI (Ref. [94]) Proposed (This Work)
De
lay
(ps)
RCA (Ref. [27])
Conventional CslA (Ref. [172])
BEC CslA (Ref. [125])
Modified CslA (Ref. [89])
CLA (Ref. [94])
0
50
100
150
200
250
300
350
400
CMOS (Ref. [172]) GDI (Ref. [94]) Proposed (This Work)
Po
we
r C
on
sum
pti
on
(µ
W) RCA (Ref. [27])
Conventional CslA (Ref. [172])
BEC CslA (Ref. [125])
Modified CslA (Ref. [89])
CLA (Ref. [94])
0
20
40
60
80
100
120
CMOS (Ref. [172]) GDI (Ref. [94]) Proposed (This Work)
PD
P (
e-1
5J)
RCA (Ref. [27])
Conventional CslA (Ref. [172]) BEC CslA (Ref. [125])
Modified CslA (Ref. [89])
CLA (Ref. [94])
73
The performance improvement of parallel adder structures such as
RCA, CslA and CLA with the help of proposed gates and adder is attempted.
It is observed that the proposed gates based RCA and CLA have shown speed
improvement of 12%, and 14%, respectively than CMOS logic. Likewise, the
amount of area reduction achieved in RCA, Modified CslA and CLA based on
proposed gates is 53%, 55% and 28%, respectively more than CMOS logic.
Among the parallel adders, the proposed gates based Modified CslA adder has
shown 43% more improvement in PDP than their existing CMOS
implementation. From the discussion of the performance improvement of
various parallel adders based on proposed gates in GDI logic, CslA adders
have better improvement in PDP than that of RCA and CLA adders.
Therefore, they can be used in the multipliers realization for better
performance.
4.4 SUMMARY
The existing implementation of parallel adders lacks in terms of area
and delay, which is due to the requirement of more transistor count for its
base components such as AND, XOR and adder. To overcome these
drawbacks, proposed gates and adder using GDI logic are employed in the
realization of parallel adders. Further, the performance of the parallel adders
is analyzed using SPICE simulation at 45 nm technology models. The
performance parameters like delay and power consumption of the parallel
adders are measured from their simulation results. In addition, area is
measured from the corresponding layout. From the obtained results, it is
understood that the implemented parallel adders require smaller power delay
product and area compared to the other designs found in the literature.
74
CHAPTER 5
AREA AND ENERGY EFFICIENT VEDIC MULTIPLIER
IMPLEMENTATION
5.1 INTRODUCTION
Designing multiplier with high speed low power and minimal layout
structure is of prime importance. This thesis presents a high speed digital
multiplier by taking the advantage of Vedic multiplication algorithm with low
power design technique called GDI logic. Vedic mathematics is an ancient
Indian mathematics, which was derived from Vedic sutras. It was
rediscovered in the early twentieth century by Maharaja (2001). UT is one of
the Vedic sutras, which literally means vertically and crosswise and is used to
perform a multiplication. In this multiplication process, the partial products
are accumulated at every step as opposed to the conventional multiplication
schemes. Therefore, the speed of this multiplier can be improved by reducing
its partial product accumulation delay. This is attempted in the proposed 8 bit
multiplier. This multiplier computes the output in two stages. At the first
stage, the additions of n bit partial products are performed using n bit adder.
After each addition process, sum and carry outputs are computed and they
move into second stage. It is noted that carry free addition is performed in this
stage. Also, the output bits including sum and carry from the first stage are
not exceeding more than five bits. Therefore, it could be processed efficiently
using 42C rather than with the full adder, which is used in the existing
scheme.
75
A number of interesting methods for realizing a UT multiplier has
been introduced in the last several decades. Several field programmable gate
array realizations of Vedic multiplier are discussed in the literature by Karthik
et al (2012) and Zulhemi et al (2013). They claimed that this multiplier
minimizes the area and improves the speed compared with conventional
multipliers. Also, the multiplier performance improvement in respect of
power, delay and area is attempted by various researchers too. Pushpangadan
et al (2009) introduced a way of developing higher order multiplier from a
smaller one using Vedic multiplication. This was followed by Saha et al
(2011) who implemented 32x32 Vedic multiplier and compared its
performance with Booth radix-4 multiplier. Tiwari et al (2008) discussed the
delay reduction in Vedic multiplier with the help of carry look ahead adder.
This multiplier result indicates that the delay and area of Vedic multiplier is
smaller than array and Booth multiplier.
An introduction of pipeline technique to increase the speed was
discussed by Kunchigi et al (2012). The Vedic multipliers designed with
Random Only Memory (ROM) based approach explained by Sriraman et al
(2012) which offers a significant improvement in speed and power dissipation
compared with conventional multipliers. However, this performance depends
on the reading process of ROM and the subtraction operation. Moreover, the
required area is becoming large for more than 8 bit multiplier. Deepa et al
(2013) had improved multiplier speed by folding and retiming mechanism at
the expense of more area. Vedic multipliers based on carry skip adder
proposed by Senapati et al (2012) offers significant improvement in speed
compared with standard multiplier architectures. The power consumption of
the Vedic multipliers is minimized by implementing them in reversible
(Chanda et al (2013)) and adiabatic logic (Gupta et al (2012)) at the expense
of increase in latency.
76
The Vedic multipliers presented in the existing literature need
further improvement in terms of area, power and speed. This is addressed in
the proposed multiplier. This multiplier delay is reduced with the help of
deployment of 42C in the partial product addition. Also, the GDI logic based
multiplier realization minimizes the power consumption and area. As a whole,
the proposed multiplier has improved the overall performance. The rest of the
Chapter is organized as follows: An overview of the UT multiplication is
briefly given in Section 2. The proposed multiplier is explained in Section 3.
In Section 4, results and discussion are given and the Section 5 concludes the
Chapter.
5.2 AN OVERVIEW OF URDHVA TRIYAGBHYAM
MULTIPLICATION SCHEME
Urdhva Triyagbhyam (UT) is one of the Vedic sutras, which
literally means vertically and crosswise and is used to perform the
multiplication operation. This method requires AND gate, half adder and full
adder for carrying out multiplication operation. It is noted that the partial
products are generated in parallel and become available, prior to actual
addition, thus saving processing time. It is well suited for the multiplication of
both decimal and binary numbers. The mathematical background of UT
algorithm for decimal and binary number multiplication is given below:
5.2.1 UT Algorithm for Decimal Number System
Let X and Y are two numbers to be multiplied. Mathematically X and
Y can be represented as
(5.1)
(5.2)
77
(5.3)
(5.4)
Where X and Y are from 0, 1, 2…, 9 and k may be any integer number.
In the case of 2*2 multiplications, the two inputs are A and B, each
having two digits of X0, X1 and Y0, Y1, providing four outputs P3P2P1P0, as a
result of vertical / crosswise multiplication and addition. The sequential steps
involved in the multiplication procedure are explained as follows:
Step 1: Multiplication process starts from lower digit i.e. X0, Y0 and moves
towards the higher digits X1, Y1.
Step 2: First digit of the product i.e. P0 is obtained by vertical multiplication
of X0 and Y0.
Step 3: For computing P1, do cross wise multiplication of X1, Y0 with X0, Y1,
to get the partial products. After that, perform the addition of these
products, sum is retained as P1. During the addition, if any carry
exists, it is moved to subsequent stage.
Step 4: P2 is obtained from the vertical product of X1 and Y1, provided that
there is no carry from step 3. Otherwise, P2 is obtained after the
addition of vertical product with carry. During this addition process,
any carry occurs, that acts as P3.
As an example, the multiplication of 23 and 22 based on UT method
is illustrated in line diagram as given in Figure 5.1.
The multiplier inputs 23 and 22 are taken as X and Y, having digit X1
X0 as 2 and 3, Y1 Y0 as 2 and 2 respectively.
78
Figure 5.1 Multiplication of 2x2 decimal number using UT algorithm
The concatenation of individual product digit P2P1P0 constitutes the
final product P. Thus, the product is 506.
5.2.2 UT Algorithm for Binary Number System
The UT technique can also be extended to binary number system
and it is found to work accurately. The partial products bits are generated in
single step which minimize the delay associated with the multiplier. For
understanding the multiplication operation, consider the multiplicand and
multiplier represented as X and Y having N number of bits. These two inputs
can be described in mathematical form as
79
(5.4)
(5.5)
xi and yj represent binary numbers may have values of either 0 or 1. The product
of two N bit can be expressed as P where
(5.6)
The multiplication procedure explained for decimal number system
is also holding good for binary number system too.
The advantage of this technique is that all the individual products
are computed in parallel. It increases the speed of operation. But the delay
increases with the increase in number of input bits. This would be a limitation
for this procedure. Hence, this method is suitable for small range of input
though it is appropriate for all cases of multiplication. To mitigate this issue,
multiplier is proposed, without increasing wiring complexity or hardware, by
accounting compressor tree for performing the addition of more than 3 bits at
a time. The various compressors are available in the literature among them
42C is considered for the above said multiplier as it is simple and has regular
interconnection pattern. The discussion of the proposed multiplier and its
implementation is explained in the following Section.
80
5.3 PROPOSED MULTIPLIER
The block diagram of the proposed Vedic multiplier is depicted in
Figure 5.2. The multiplier and multiplicand inputs are represented as X and Y
respectively, and the output is represented as P. The products of Xi.Yj are the
multiplier partial products where the range of i and j is 0 to n-1, n is the
multiplier width. The multiplier partial products are generated from n2
number
of AND gates for n bit input operand. The multiplier output is generated from
these partial products in two stages. In the first stage, these partial products
are added with the help of full adders. Moreover, the parallel addition of
partial products eliminates carry propagation in this stage. This is the
advantage of proposed multiplication technique, whereas in the conventional
scheme carry propagation is allowed, which slows down the multiplication
task. This is illustrated in Figure 5.2, by assigning the individual blocks for
sum and carry outputs to indicate that they are computed in parallel. These
sum and carry outputs are becoming inputs for the second stage computation.
Partial products generation
using AND gates
Adder
Sum outputs Carry outputs
Adder and 4-2 Compressor
Final product bit
(P2n-1,…,P1)
First stage
Second stage
Multiplicand (X) Multiplier (Y)
nn
P0(X0.Y0)
Figure 5.2 Block diagram representation of the proposed Vedic multiplier
81
Figure 5.3 Internal architecture of the proposed Vedic multiplier (a)
First stage and (b) Second stage
2 BIT
ADDERx1y0
x0y1
P1
C1
3 BIT
ADDER
x2y0
x0y2
S2C2
x1y1
4 BIT
ADDER
x3y0
x0y3
S3C3
x2y1x1y2
C4
5 BIT
ADDER
x4y0
x0y4
S4
C5x3y1x1y3 C6x2y2
6 BIT
ADDER
x5y0
x0y5
S5
C7x4y1x1y4 C8x3y2x2y3
7 BIT
ADDER
x6y0
x0y6
S6
C9x5y1x1y5
C10x4y2x2y4x3y3
8 BIT
ADDER
x7y0
x0y7
S7
C11x6y1x1y6 C12x5y2x2y5x4y3x3y4
C13
7 BIT
ADDER
x7y1
x1y7S8
C14x6y2x2y6
C15x5y3x3y5x4y4
6 BIT
ADDER
x7y2
x2y7
S9
C16x6y3x3y6
C17x5y4x4y5
5 BIT
ADDER
x7y3
x3y7
S10
C18x6y4x4y6 C19x5y5
4 BIT
ADDER
x7y4
x4y7
S11
C20x6y5x5y6
C21
3 BIT
ADDER
x7y5
x5y7
S12
C22x6y6
2 BIT
ADDER
x7y6
x6y7
S13
C23
2 BIT
ADDERP2
S2
C1
3 BIT
ADDER P3
S3
C2
3 BIT
ADDER
S4
C3P4
4-2 C P5S5
C4
C5
4-2 C P6S6
C6
C7
4-2 C
P7S7
C8
C9
C5B
C5B
C7B
C9B
4-2 C
P8S8
C10
C11
C7B
C11B
4-2 C P9
S9
C12
C14
C9BC13B
4-2 C
P10S10
C13
C15
C16
C15B
4-2 C
P11S11
C17
C18
C13B
C17B
4-2 C P12
S12
C19
C20
C15BC19B
4-2 C P13
S13
C21
C22
C17B
C21B
4-2 C P14
S14
C23
C19B C23B
3 BIT
ADDER
C23B
C21BP15
(a) (b)
82
The inputs of the first stage are from the partial products which are
generated based on Vedic multiplication, after that, they are accumulated
directly using n bit adder for n bit partial products and their sum and carry
outputs are represented as S and C, respectively. They act as the inputs for the
second stage. This comprises of adder and 42C to perform the addition task.
The arrangement of 42C in the final addition process will increase the
multiplier speed. In the second stage, full adder and 42C are deployed from
which final multiplier output is obtained. The full adder and 42C are used in
the place of addition of three and more than three bits, respectively. It is
interesting that the use of full adder for the addition of more than three bits is
eliminated with the help of 42C. Due to the use of 42C, carry free addition is
performed in the second stage increasing the multiplier speed. Not only that,
the regular interconnection pattern of compressor also minimizes the
multiplier interconnection complexity. The internal architecture details of the
first and second stage of the proposed multiplier are given in Figures 5.3(a)
and 5.3 (b), respectively. The notation of 4-2 C in the Figure 5.3(b) represents
42C.
5.4 RESULTS AND DISCUSSION
In this Section, the simulation results of both the proposed and the
existing multiplier are presented. The performance parameters such as area,
delay, power consumption and PDP of multiplier are evaluated through the
SPICE simulation results at 45 nm technology with a supply voltage (VDD) of
1.1 V. The simulation results of multiplier in respect of delay, power
consumption and layout area are given in Table 5.1. After the completion of
simulation of multipliers, the layout is generated for each of them and
subjected to DRC and then LVS check before the extraction of parasitic.
Subsequently, the extracted parasitic file is back annotated to perform the post
layout simulation.
83
Table 5.1 Performance comparison of 8 bit proposed multiplier with
existing designs
S. No. Multiplier Delay
(ps)
Power Consumption
(µW)
Area
(µm2)
PDP
(e-15 J)
1 Ref. [118] 552 83 2415 45.8
2 Ref. [74] 465 78 1678 36.2
3 Proposed
(This Work)
432 68 1164 29.3
Delay:
The delay is calculated from the 50% of the voltage level of the
input to 50% voltage level of the resulting output voltage for each transition.
The maximum delay is taken as worst case delay. In the proposed multiplier
carry propagation is eliminated during the partial products addition thus
reducing the delay significantly. The speed improvement obtained by the
proposed multiplier is 22% more than the multiplier discussed in [118].
Power Consumption:
The power consumed by the multipliers is computed through
simulation and given in Table 5.1. It is observed from the results that the
proposed multiplier design has lesser power consumption than that of existing
designs. This is due to the implementation of its building components namely,
AND gate, full adder and 42C using proposed design, which minimizes the
multiplier transistor count considerably and spurious transitions, thus reducing
the overall power consumption. The power saving accomplished in the
proposed design is 13% more compared to the Vedic multiplier discussed in
[74]. Also, it is noted that the use of multichannel technique minimizes the
power consumption compared to the multiplier discussed in [118]. However,
the variation of threshold voltage, necessitated by the multichannel process, is
becoming a difficult task during the fabrication process.
84
Area:
The layouts are drawn for all the simulated multiplier and the area is
calculated from them. The values are given in Table 5.1. From the obtained
results, it is observed that the proposed multiplier has 31% lesser area
compared to the recently reported Vedic multiplier in [74]. The layout of the
proposed Vedic multiplier is given in Figure 5.4.
PDP:
The power delay product of the proposed and existing multiplier
designs is given in Table 5.1. The power consumption is minimized
considerably by implementing the proposed multiplier using GDI logic. Also,
the delay is reduced in the proposed multiplier. Hence, the energy (or power
delay product) saving accomplished with proposed design is 36% more than
the multiplier discussed in [118].
86
Sensitive to Process Variation:
A study of circuits performance under the local and global process
variations is carried through Monte Carlo simulations and the results are
tabulated in Table 5.2. It is observed that the proposed multiplier has better
immunity to process variation. Moreover, the design based on multichannel
technique, discussed in [74] is more sensitive because of driving current
dependency on the process sensitivity Vt, which is amplified due to voltage
drops at internal nodes.
Table 5.2 Performance analysis of multipliers under process variation
S. No. Multiplier Delay
(ps)
Power Consumption
(µW)
PDP
(e-15 J)
1 Ref. [118] 569 87.2 49.6
2 Ref. [74] 484 81.5 39.4
3 Proposed
(This Work)
441 69.2 30.5
5.5 SUMMARY
This thesis presents an approach to design Vedic multiplier, in such
a way to improve its speed with the help of deploying 42C in its architecture,
without increasing area. Due to the presence of 42C in this multiplier
architecture, the number of full adders is reduced compared to the existing
design. Though the existing and proposed Vedic multipliers have same
number of stages to perform in the multiplication operation, the delay of
proposed multiplier is reduced by generating intermediate carries which is
independent of carry in inputs. Also, the basic components of multiplier such
as AND gate, adder and 42C have been implemented using proposed designs.
The proposed and the existing multiplier designs are simulated using 45 nm
87
technology model. The comparison is done in terms of delay, power
consumption, area and PDP. The proposed design has shown 35% more
improvement in power delay product compared with the existing multiplier
reported in the literature. The effect of process variation on the multipliers
performance has been analysed through Monte Carlo simulations. From the
obtained results, it is concluded that the proposed multiplier has shown about 2%
performance variations. Hence, this fast energy efficient multiplier can be used as
one of the building modules for the realization of real time signal processing
applications.
88
CHAPTER 6
HIERARCHY MULTIPLIER ARCHITECTURE BASED ON
VEDIC MATHEMATICS AND GDI LOGIC
6.1 INTRODUCTION
Hierarchy multiplier is attractive because of its ability to carry the
multiplication operation within one clock cycle. The existing hierarchical
multipliers occupy more area and also results in more delay. Therefore, in this
Chapter 6, a method to reduce the computation delay of hierarchy multiplier
by employing CslA and BEC is proposed. The use of BEC eliminates the
number of adders, existing in the conventional addition scheme, where n
denotes the multiplier input width. As the area of the hierarchy multiplier is
determined by its base multiplier, the base multiplier is realized with the
proposed Vedic multiplier, which has small area and operates with less delay
than the conventional multipliers. In addition, the reduction of power
consumption in the hierarchy multiplier can be ensured by implementing the
designed multiplier using GDI logic.
In general to design n bit hierarchical multiplier, four
2 base
multipliers are necessary which generate 2n bit output, where n represents
hierarchical multiplier input width. It is noted, all the base multipliers are
allowed to perform the task in parallel. Due to that, the performance of the
hierarchy multiplier is determined from the accumulation delay of its base
multipliers output bits. But this is a time consuming task as it requires more
number of additions and is considered as a bottleneck for the hierarchy
89
multiplier performance. In this work, an approach to perform this
accumulation with less number of addition process is proposed. The following
are the contributions discussed in the Chapter:
(i) For the area and delay efficient implementation of base
multiplier, Vedic multiplier( discussed in previous Chapter) is
considered
(ii) To reduce the accumulation delay of base multiplier output
bits, CslA and BEC are introduced
(iii) To realize the hierarchy multiplier with small area, GDI logic
is chosen
The rest of the Chapter is organized as follows: An overview of the
hierarchy multiplier is described in Section 2. In Section 3, the explanation of
the proposed hierarchy multiplier and the implementation of its building
components namely, base multiplier, CslA adder, BEC converter are also
given. The simulation results and discussion are given in Section 4 and finally,
the Section 5 summarizes the Chapter.
6.2 AN OVERVIEW OF HIERACHY MULTIPLIER
Hierarchy multiplier is significant because of its ability to carry
multiply operation within one clock cycle. The major concern in designing
such multiplier is to minimize the overhead in terms of circuit footprint,
power consumption and computational delay that is required to achieve
reconfigurable. The basic hierarchical topology of large width multiplication
is given in Figure 6.1. After hierarchically decomposed, this scheme needs a
set of base multipliers. For this, the high performance and resource efficiency
of the built in hardware multipliers based on Vedic mathematics is considered
in the proposed hierarchy multiplier design.
90
Given two n*n n unsigned binary numbers X and Y, conventional
principle for calculating X *Y with
2 *
2 multipliers can be expressed as
P = X * Y = (XH.XL) * (YH.YL ) (6.1)
where XH, XL and YH.YL represent the lower and higher order input bits of X
and Y. Eq.(6.1) suggests that a n * n multiplication can be carried in two steps
as depicted in Figure 6.1.
Figure 6.1 Representation of hierarchy multiplier
First, 4n partial products are produced from four n * n multipliers
i.e. executing four n*n multiplications in parallel. Second, the partial products
are summed by using one stage carry save array adder and a fast carry
propagate adder to obtain final 4n bit product. In this way large width
multiplier is implemented with the help of smaller modules. Note that four
multiplier outputs are computed in parallel. In order to perform this, four
2 bit
multipliers are required. It achieves high computation performance by
exploiting parallelism in computing the partial products. The hierarchical
XHYH XH YL XLYH XLYL
Carry Save Adder
Carry Select Adder
XH XL YH YL
Pn-1-0P2n-1:n
X Y
91
principle helps to realize fast large bit multiplier, except that it requires a large
adder for performing the addition process. This large adder poses limitation
on the performance and increases the area of designed multiplier.
The above mentioned issues in the existing hierarchy multiplier can
be addressed by
(i) Incorporating BEC to eliminate n/4 number of adders at the
final stage of addition process
(ii) Performing the final addition using CslA
(iii) Implementing the proposed hierarchy multiplier using GDI
logic
6.3 METHODOLOGY
In this Section, an approach for efficient implementation of n bit
hierarchy multiplier with minimum delay will be presented and discussed. As
an example, the architecture for 16 bit multiplier design is explained. Further,
a new design is suggested for the hierarchy multiplier building block namely,
base multiplier based on Vedic mathematics. Following that, the discussion of
CslA, binary to excess 1 converter and GDI logic is carried out in this Section.
6.3.1 Proposed Hierarchy Multiplier
In general, the hierarchy multiplier speed is determined from the
computation delay of base multiplier output bits addition. This delay can be
decreased by minimizing the number of additions without affecting the
functionality. The following approach is incorporated in the proposed n bit
hierarchy multiplier multiplication procedure to reduce the delay:
92
Step 1: The multiplier inputs and output are represented as X, Y and Z,
respectively.
Step 2: Divide n bit multiplier inputs i.e., X and Y, into equal two halves.
For the input X, it is divided into (Xn,… Xn/2), (Xn/2-1,…, X0), which
are assigned as XH and XL, respectively. The same procedure is also
adopted for another multiplier input Y.
Step 3: After dividing both the inputs, they are formed into four groups like
(XL, YL), (XH ,YL), (XL ,YH) and (XH, YH).
Step 4: The multiplication is accomplished using four
2 bit base multipliers
namely, a0, a1, a2 and a3.
Step 5: The multiplier product bits Zn/4-1,…, Z0 is obtained from 0 to
2-1
output bits of a0.
Step 6: The resultant bits of a1, a2 and concatenation of a0 (
2 to n), a3 (0 to
2-
1) will be formed an array of carry save format which are processed
by carry save adder.
Step 7: The resultant sum and carry from carry save adder are becoming the
inputs for CslA of n bit adder. Also, the sum output of CslA adder is
assigned as multiplier resultant bits for the range of Zn+n/2-1,…, Zn/2.
Step 8: BEC takes the input from a3 (
2 to n bit) and its output bits are
available prior to CslA and they are passed to a multiplexer.
Step 9: The multiplier output bits Z2n,…, Zn+n/2 is obtained from the
multiplexer, based on the carry output of CslA adder, if it is one
then the BEC output becomes the output otherwise the product bits
of a3 (
2 to n bit).
93
Based on this approach, 16 bit (n) hierarchy multiplier architecture
is designed as shown in Figure 6.2. The multiplier inputs are X, Y of 16 bit
width and produces the output Z of 32 bit. First, the inputs X and Y are divided
into equal two halves namely, XH and XL, YH and YL and they are multiplied by
8 bit base multiplier. As seen in Figure 6.2, the symbols of a0, a1, a2 and a3
denote the base multiplier for the multiplication of (XL and YL), (XH and YL),
(XL and YH) and (XH and YH), respectively. Once these multiplication processes
are over, their output bits will form a carry save array as per step 6, which in
turn is processed by carry save adder thus resulting into two rows of 16 bit
output. These bits are further added with the help of 16 bit CslA adder to
produce the multiplier output bits of Z23,…, Z8. Meanwhile, the BEC also
computes its output and feeds to multiplexer as one of the inputs. Another
input for the multiplexer is from a3 output (half of the output bits i.e., n/2 to n-
1). Finally, the multiplexer selects, either BEC or a3 output bit as Z24 to Z31,
based on CslA adder’s carry
Figure 6.2 Proposed 16 bit hierarchy multiplier
Multiplier
a2
CSA adder
CslA adderMUX
Multiplier
a1
Multiplier
a0
Multiplier
a3
BEC0 to n/2-1 bits
(M)
n/2 to n-1 bits
(M)
0 to n/2-1 bits (P)
0 to n-1 bits
(N)0 to n-1 bits (O)
sc
s
c
n/2 to n-1 bits
(P)
Z7Z6Z5Z4Z3Z2Z1Z0Z15Z14Z13Z12Z11Z10Z9Z8Z23Z22Z21Z20Z19Z18Z17Z16Z31Z30Z29Z28Z27Z26Z25Z24
XLYLXL YL XHYHYH XH
94
As a result of introduction of BEC in the hierarchy multiplier, n/4
adders are eliminated. Due to the parallel computation of BEC and CslA
output, the processing delay for multiplier output bits i.e., Z24 to Z31 is
minimized significantly. As seen from the architecture of proposed hierarchy
multiplier, given in Figure 6.2, the critical path of the proposed architecture
consists of one base multiplier, one bit adder, one CslA adder and multiplexer
only. Further, the implementation details of building components of the
hierarchy multiplier namely, base multiplier, CslA adder and BEC converter
are described in the following subsection.
6.3.2 Base Multiplier
As discussed in the earlier Section, the performance of the hierarchy
multiplier is determined by its base multiplier. In the conventional
multiplication techniques, the intermediate computation involved in the
multiplier operation reduces the speed exponentially in accordance with the
number of bits present in multiplier input. This becomes critical issue for
more number of input bits. But this issue can be mitigated by the parallel
addition of partial products which is an inherited principle of Vedic
multiplication method. Though partial products reduction is possible in Booth
multiplication, the encoding and decoding mechanism involved in this method
increases the circuit complexity thereby power consumption. On the other
hand, Wallace multiplication uses random placement of counters for the
efficient partial product accumulation thus makes the design becomes
complex than the conventional scheme. Therefore, the Vedic multiplication is
considered as an alternative way of performing the multiplication operation
without increasing the circuit complexity and power consumption. In this
multiplication process, the partial products are accumulated at every step as
opposed to the conventional multiplication schemes. Therefore, the speed of
this multiplier can be improved by reducing its partial product accumulation
delay. This is attempted in the proposed 8 bit multiplier and its representation
is shown in Figure 6.3.
95
The multiplier inputs and outputs are represented as Xi, Yi and P2i,
where i is 0 to n-1, n denotes the input bit width (for 8 bit multiplier, n=8).
The multiplier partial products (X.Y) are generated using AND gates. From
them, the partial product of X0.Y0 is output bit of multiplier, i.e., P0, whereas
the remaining output bits are obtained after two stage computation. In the first
stage, the partial products generated from AND gates which are added using
adder. After each addition process, sum and carry are computed and they
move into second stage. It is noted that carry free addition is performed in this
stage. Also, these output bits including sum and carry from the first stage are
not exceeding more than five bits. Therefore, 42C is chosen for adding of
these bits rather than full adder, which is used in the existing scheme. Due to
the use of 42C, carry free addition is ensured in the second stage too.
Partial products generation
using AND gates
Adder
Sum outputs Carry outputs
Adder and 4-2 Compressor
Final product bit
(P2n-1,…,P1)
First stage
Second stage
Multiplicand (X) Multiplier (Y)
nn
P0(X0.Y0)
Figure 6.3 Block diagrammatic representation of base multiplier
96
6.3.3 Carry Select Adder
There are various adders employed for the addition of base
multiplier product bits. They are namely, ripple carry, carry look ahead, carry
select and prefix adder. It is well known from the performance study of these
adders that CslA has modest performance in terms of area and delay
[Ramkumar and Kittur 2012 and Mohanty and Patel 2014]. Also, proposed
gates based CslA adder has shown improved performance which is
elaborately discussed in (Chapter 5). Therefore, CslA adders are chosen as
parallel adder while implementing the proposed hierarchy multiplier
architecture.
6.3.4 Binary to Excess 1 Converter
To reduce the delay of partial products addition in the hierarchy
multiplier, this work uses BEC instead of adder for the output bits of Z2n-
1,…,Zn+n/2,. For n bit input width, n+1 bit BECs are required. A structure of 4
bit BEC is shown in Figure 6.4.
B3 B2 B1 B0 B0
X3 X2 X1 X0
Figure 6.4 4 bit BEC circuit
97
6.4 RESULTS AND DISCUSSION
In this section, the simulation results of the 16 bit hierarchy
multiplier and 8 bit binary to excess 1 converter are presented. The
performance parameters such as area, delay, power consumption and PDP of
the simulated designs are evaluated through the SPICE simulation at 45 nm
technology with a supply voltage (VDD) of 1.1 V. Typical transistor sizes, i.e.,
(W/L)p=240 nm/45 nm and (W/L)n=120 nm/45 nm are considered. The delay
and power consumption are calculated as follows: The delay is computed by
accounting the time from the 50% of the input voltage swing to 50% of the
output voltage swing for each transition. The maximum delay is treated as
worst case delay. Likewise, the power consumption is determined from the
various switching activities and the capacitances of circuit. These procedures
are extended for the delay and power consumption calculation of all the
simulated modules namely, proposed hierarchy multiplier and binary to
excess 1 converter.
6.4.1 Proposed Hierarchy Multiplier
The simulation results of the proposed and existing multipliers are
given in Table 6.1.
Delay:
The delay computed through simulation, for all the structures, is
given in Table 6.1 and it is observed that the proposed multiplier has smaller
delay compared to other existing implementations. Due to the deployment of
BEC converter in the base multiplier output bits accumulation, the numbers of
adders are reduced, thus decreasing the delay significantly. Moreover, the
time taken for the binary to excess 1 converter is not accounted in the critical
path delay thereby the speed is improved. The speed improvement obtained
98
by the proposed design is 27% and 11% more than that of multiplier
discussed in [70] and [1] , respectively.
Table 6.1 Performance comparison of the proposed 16 bit hierarchy
multiplier with other multipliers
S. No. Multiplier Delay
(ps)
Power Consumption
(µW)
Area
(µm2)
PDP
(e-15 J)
1 Ref. [70] 727 658 14510 478
2 Ref. [16] 657 563 14978 369
3 Ref. [1] 594 608 15210 361
4 Proposed Hierarchy
Multiplier (This Work)
528 424 12420 223
Power Consumption:
The power consumed by the simulated hierarchy multipliers is
presented in Table 6.1. The minimum power consumption is witnessed in the
proposed design due to the elimination of redundant hardware exhibited in the
existing designs thus minimizing the spurious activities. The proposed design
has 30% less power consumption than that of multiplier discussed in [1].
Area:
The area is computed from the layout of simulated multipliers and it
is given in Table 6.1 whereas the layout of the proposed multiplier is given in
Figure 6.5. From the obtained results, it is witnessed that proposed multiplier
has less area. As stated earlier, the proposed gates and adder are used to
implement the basic components of hierarchical multiplier namely, base
multiplier, CslA adder, BEC converter with reduced transistor count.
Therefore, the area of the proposed hierarchical multiplier is small. The
percentage of area reduction possible with proposed design is about 18%
more than that of a recently reported multiplier in [1].
99
PDP:
The power delay product of the all simulated designs is given in
Table 6.1. Among the multipliers discussed, the best and the worst PDP
witnessed correspond to the proposed and the design discussed in [70],
respectively. Also, the energy conservation accomplished with proposed
design is 38% more than the multiplier reported in [1].
Figure 6.5 Layout of the proposed 16 bit hierarchy multiplier
100
Sensitive to Process Variation:
The sensitivity of the circuit’s performances namely, delay and
power consumption under process variations are studied through Monte Carlo
simulations and their results are given in Table 6.2. The performance
variations are noted as 3%, which is lesser than the existing hierarchy
multiplier results.
Table 6.2 Performance analysis of 16 bit hierarchy multiplier under
process variation
S. No. Multiplier Delay
(ps)
Power Consumption
(µW)
PDP
(e-15 J)
1 Ref. [70] 769 698 536
2 Ref. [16] 692 607 420
3 Ref. [1] 634 638 404
4 Proposed Hierarchy
Multiplier (This Work)
541 441 238
6.4.2 Binary to Excess 1 Converter
The gates of BEC are designed based on CMOS, CPL, GDI and
proposed one. The performance parameters in respect of delay and power
consumption are calculated from the simulation results and tabulated in Table
6.3. As seen from the values the realization of BEC, using proposed gates,
improves its performance compared with the CMOS and CPL.
101
Table 6.3 Performance comparison of 8 bit BEC
S. No. Design Delay
(ps)
Power Consumption
(µW)
Area
(µm2
)
PDP
(e-18 J)
1 Ref. [172] 203 15 537 3045
2 Ref. [45] 188 21 583 3948
3 Ref. [94] 245 11 501 2695
4 Proposed
(This Work)
173 9 445 1557
The delay and power consumption of the BEC, based on proposed
gates, is reduced by 15% and 40%, respectively compared to conventional
CMOS realization. The area is calculated from their layout and is given in
Table 6.3. It is observed that the 17% more area saving is possible with
proposed BEC design than CMOS based implementation. The layout of
proposed BEC is shown in Figure 6.6. Further, Monte Carlo simulation is also
performed to study the circuit robustness under process variation. From the
results, it is noted that proposed BEC circuit has shown 1% performance
variation with respect to process changes.
102
Figure 6.6 Layout of the proposed 8 bit BEC
6.5 SUMMARY
A BEC converter based hierarchy multiplier architecture is proposed
here. It operates with shorter delay due to the removal of n/4 number of
adders, presented in the existing hierarchy multiplier. Moreover, the delay
incurred by BEC is not affecting the hierarchical multiplier because it is not
included in the critical path of the multiplier. In addition to that, a new design
for base multiplier is proposed, based on Vedic mathematics. It has less delay
and small area compared to other multipliers found in the literature. The
major outcome of the proposed design is that the number of adders has been
reduced is more while other reported works remain high. Also, the realization
of proposed multiplier using proposed gates and adder reduces its power
consumption and area. Thus, area-power and delay efficient hierarchy
multiplier is designed. The performances of delay and power consumption of
103
the existing and the proposed hierarchy multipliers are calculated through SPICE
simulation using 45 nm technology model. From the simulation results, it is
calculated that the energy saving achieved by the proposed multiplier design is
38% more than the recently reported multiplier. Further, the multipliers
performance study with respect to process variations is done and examined that
the proposed multiplier has shown 3% performance variation, which is less than
their counterparts. Therefore, the proposed multiplier can be used in the media
processing applications in which large width multiplier with less energy
consumption is of prime importance.
104
CHAPTER 7
CONCLUSION AND FUTURE WORK
7.1 CONCLUSION
This dissertation is mainly focused on the design of arithmetic
circuits namely, full adder, 4-2 compressor, parallel adders and multiplier
with the help of full swing gates in GDI logic. The low power high speed
multiplier with small area is possible by adopting Vedic mathematics based
multiplication technique followed by transistor level implementation carried
out using GDI logic. The merits of GDI logic are to implement the basic
modules of multiplier namely, AND gate, adder and 4-2 compressor with low
power consumption and less transistor count. A new method for partial
products accumulation in the Vedic multiplication has been discussed and
further implemented using GDI logic. Moreover, the scalability of the
designed multiplier is also analyzed through hierarchy multiplication
principle. In addition, the performance of all the designed circuits with respect
to process variation are studied through Monte Carlo simulation and it is
observed that proposed designs show lesser performance parameter changes
than their counterparts. The novelty and significance of these mechanisms are
listed below:
From the operational characteristics of GDI gates, it is concluded
that they produce reduced output voltage, i.e. the output high (or low) voltage
is deviated from the VDD (or GND) by threshold voltage Vt for certain input
combinations. The placement and proper biasing of PMOS or NMOS
105
transistor at the output terminal, depending on the voltage deviation either from
VDD or GND potential, provides full swing output. Based on this technique AND,
OR, XOR and XNOR are designed. From the simulation results of the gates, it is
understood that the proposed gates using GDI logic have shown improved
performance compared to that of conventional GDI designs. The proposed AND,
OR, XOR and XNOR gates operated with less delay by 5%, 45%, 66% and 62%,
respectively than existing gates based on GDI. Likewise, the power conservation
in proposed AND, OR, XOR and XNOR gates are 10%, 12%, 30% and 27%,
respectively more than those available GDI gates. The area reduction attained in
the AND, OR, XOR and XNOR gates are 24%, 23%, 24% and 17%, respectively
more than existing GDI based on those designs. Further, the performance
variations of these proposed gates with respect to process changes are calculated
from Monte Carlo simulation and 1% variation is observed.
With the help of the proposed gates, three designs for full adder are
designed. It is observed from the computed delay values, among the three
proposed designs, Design 2 has the lowest delay since Cout and Sum are
computed in parallel. The full adder design based on Design 2 operates faster
by 41% more than CMOS full adder. Also, the power consumption results
reveal that the three proposed adders consume low power. Among the
proposed adders, Design 1 consumes low power since it adopts the proposed
XOR gate and requires minimum transistor count than the other two proposed
design. The percentage of power savings attained with Design 1 than
conventional GDI adder is 30. Not only power and delay, it is observed that
three proposed full adders consume small amount of energy. This is due to the
presence of full swing gates in the proposed full adders. These full swing
gates will only switch the required transistor for the particular input. In
addition to that, all three designs require less transistor count that results into
reduction of the gate capacitance. Hence, they consume less energy. The
106
amount of energy saving can be possible with Design 2 is 32% more than
CMOS. It can be concluded that proposed adder Design 2 is having higher
immunity to process variation in both delay and power distribution.
A new design for 4-2 compressor is proposed based on
simplification of its Boolean output expression. Due to the simple and regular
architecture the power consumption of the proposed 4-2 compressor is less.
Moreover, this design is implemented using proposed gates in GDI logic thus
results in small area. The percentage of area reduction is possible with
proposed 42C which is about 9% more than that of a recently reported
compressor. Moreover, the energy saving accomplished with proposed design
is 41% more than the existing compressor. The sensitivity of the designed
compressor under global and local variations is computed from Monte Carlo
simulation and the results reveal that the performance deviation of the
proposed compressor is about 1%.
The parallel adders performance are improved with the help of
proposed gates and adder using GDI logic. Simulation results reveal that the
delay and PDP of RCA is reduced by 12% and 16%, respectively more than
CMOS based design. Likewise, modified CslA design implemented using
proposed gates possesses decreased delay and power consumption by 14%
and 15% more compared that of GDI based existing implementation.
Similarly, the proposed gates based CLA improves the speed by 44% more
and decreased the power consumption by 19% less. Along with these
attributes, the reduction in energy consumption is achieved in proposed gates
based RCA, Modified CslA and CLA is 16%, 43% and 40%, respectively
more than CMOS based implementation. In addition, the functionality of the
implemented adders under process changes is studied from Monte Carlo
simulation and observed that they possess less variation about 2%.
107
A new architecture for performing multiplication with less
computational delay using Vedic mathematics is proposed. This multiplier
uses 4-2 compressors in the place of adders which are used in the existing
scheme. The speed improvement obtained by the proposed multiplier is 22%
more than the conventional multiplier. The proposed multiplier design has
lesser power consumption which is achieved due to the implementation of its
building components namely, AND gate, full adder and 4-2 compressor using
proposed designs, which minimizes the requirement of transistor count
considerably, thereby spurious transitions, thus reduces the overall power
consumption. Also, the energy saving accomplished with proposed design is
35% more than the conventional multiplier. The proposed multiplier has 31%
lesser area compared with the recently reported Vedic multiplier. A study of
circuits performance under the local and global process variations is carried
through Monte Carlo simulations and the results are validate that the proposed
multiplier possess 2% performance variation.
The hierarchy multiplier architecture is modified by incorporating
BEC in the place of adder to reduce the processing delay. The speed
improvement obtained by the proposed design is 27% more than that of the
existing multiplier. Also, minimum power consumption is witnessed in the
proposed design due to the elimination of redundant hardware exhibited in the
existing designs thus minimizes the spurious activities. The proposed design
has 30% less power consumption than that of existing multiplier. Also, the
energy conservation accomplished with proposed design is 38% more than the
existing hierarchy multiplier. The percentage of area reduction possible with
proposed design is about 18% more than that of a recently reported hierarchy
multiplier. The sensitivity of the circuit’s performances namely, delay and
power consumption under process variations are studied through Monte Carlo
simulations. It is examined that the proposed hierarchy multiplier has 3%
performance variation.
108
7.2 SCOPE FOR FUTURE WORK
There are many directions to extend the experiments presented in
this thesis. The following is a brief list of suggestions for possible future work
in this research domain.
The performance of multiplier can be investigated under
signal processing applications such as filtering,
transformation and so on
The implementation of squaring and cubic operations using
Vedic mathematics can be done
109
REFERENCES
1. Abbasi S A, Zulhelmi A R M and Alamoud A (201 , “FPGA design,
simulation and protyping of 32 bit pipeline multiplier based on Vedic
mathematics”, IEICE Electronics Express, vol. 12, no. 1 , Jul.,
pp. 1-12.
2. Abdoreza Pishvaie, Ghassem Jaberipur and Ali Jahanian (2014),
“High-performance CMOS (4:2 compressors”, International Journal
of Electronics, vol. 101, no. 11, Jan., pp.1511–1525.
3. Abdoreza Pishvaie, Ghassem Jaberipur, Ali Jahanian (2012),
“Improved CMOS (4;2 compressor designs for parallel multipliers”,
Computers & Electrical Engineering, vol. 38, no. 6, Nov., pp. 1703-1716.
4. Abhilash R, Raju I B K, Chary G and Dubey S (201 , “Area-power
efficient Vedic multiplier using compressors”, In Proc. of International
Conference on Electrical, Electronics, Signals, Communication and
Optimization, pp. 1-5.
5. Abiri E, Salehi M R and Darabi A (2014 , “Design and simulation of
low-power and high speed T-Flip Flap with the modified gate diffusion
input technique in nano process”, In Proc. of Iranian Conference on
Electrical Engineering, pp. 82-87.
6. Akhter S (200 , “VHDL implementation of fast NxN multiplier based
on Vedic mathematics”, In Proc. of European Conference on Circuit
Theory and Design, pp. 472-475.
7. Akhter S, Chaturvedi S and Pardhasardi K (201 , “CMOS
implementation of efficient 16-Bit square root carry-select adder”,
International Conference on Signal Processing and Integrated
Networks, pp. 891-896.
110
8. Amrutur B and Horowitz M (2001 , “Fast low-power decoders for
RAMs”, IEEE Journal of Solid-State Circuits, vol. 36, no. 10, Oct.,
pp. 1506–1515.
9. Amuthavalli G and Gunasundari R (201 , “Analysis and design of
subthreshold leakage power-aware ripple carry adder at circuit-level
using 0nm technology”, In Proc. of Procedia Computer Science,
vol. 48, pp. 660-665
10. Anders M, Mathew S, Bloechel, B, Thompson S, Krishnamurthy R,
Soumyanath K and Borkar S (2002 , “A . GHz 1 0 nm single-ended
dynamic ALU and instruction-scheduler loop”, In Proc. of IEEE
International Solid States Circuits Conference, pp. 410–411.
11. Anitha R, Deshmukh N, Agarwal P, Sahoo S K, Karthikeyan S P and
Reglend I J (201 , “A 2 bit MAC unit design using Vedic multiplier
and reversible logic gate”, In Proc. of International Conference on
Circuit, Power and Computing Technologies, pp. 1-6.
12. Anjana R, Abishna B, Harshitha M S, Abhishek E, Ravichandra V and
Suma M S (2014 , “Implementation of Vedic multiplier using Kogge-
stone adder”, In Proc. of International Conference on Embedded
Systems, pp. 28-31.
13. Anuar N, Takahashi Y and Sekine T (200 , “4-bit Ripple carry adder
using two phase clocked adiabatic static CMOS logic”, In Proc. of
IEEE Region 10 Conference, pp. 1-6.
14. Archana S and Durga G (2014 , “Design of low power and high speed
ripple carry adder”, In Proc. of IEEE International Conference on
Communications and Signal Processing, pp. 939-943.
15. Arun and Kumar M (2014 , “Design of low power split path Data
Driven Dynamic ripple carry adders”, In Proc. of International
Conference on Computing for Sustainable Global Development,
pp. 37-41.
111
16. Asif S and Kong Y (2014 , “Low-area Wallace multiplier”, VLSI
Design, vol. 2014, May, pp. 1–6.
17. Avci M and Yildirim T (200 , “General design method for
complementary pass transistor logic circuits”, Electronics Letters,
vol. 39, no. 1, Jan., pp. 46-48.
18. Badar S and Dandekar D R (201 , “High speed FFT processor design
using radix pipelined architecture”, In Proc. of International
Conference on Industrial Instrumentation and Control, pp. 1050-1055.
19. Bahadori Milad, Kamal Mehdi, Afzali-Kusha Ali and Pedram
Massoud (201 , “A comparative study on performance and reliability
of 32-bit binary adders”, Integration, the VLSI Journal, vol. 53, no.1,
Mar., pp. 54-67.
20. Bairu K. Saptalakar Shrinivas, Saptalakar K Navalagund S S and
Mrityunjaya Latte (2014 , “VLSI Implementation of reduced resource
allocation for modified carry look-ahead adder”, In Proc. of
International Conference on Advanced Communication Control and
Computing Technologies, pp. 559-564.
21. Baran D, Aktan M and Oklobdzija V G (2010 , “Energy efficient
implementation of parallel CMOS multipliers with improved
compressors”, In Proc. of International Symposium on Low-Power
Electronics and Design, pp. 147-152.
22. Bellaour A and Elmasry M I, Low-Power Digital VLSI Design
Circuits and Systems, Kluwer Academic Publishers, 1995.
23. Bhatia G, Bhatia K S, Chauhan O, Chourasia S and Kumar P (2015),
“An efficient MAC unit with low area consumption”, In Proc. of IEEE
India Conference, pp. 1-5.
24. Bhavnagarwala A, Kosonocky S V, Kowalczyk S P and Joshi R V
(2004 , “A trans regional CMOS SRAM with single logic VDD and
dynamic power rails”, In Proc. of IEEE Symposium on VLSI Circuits,
pp. 291–293.
112
25. Chaitanya kumar M V S and Selva kumar J (2014 , “Dual mode logic
carry look ahead adder”, In Proc. of International Conference on
Advanced Communication Control and Computing Technologies,
pp. 537-540.
26. Chanda M, Banerjee S, Saha D and Jain S (201 , “Novel transistor
level realization of ultra low power high-speed adiabatic Vedic
multiplier”, In Proc. of International Multi-Conference on Automation,
Computing, Communication, Control and Compressed Sensing,
pp. 801-806.
27. Chandrakasan M A and Broderson R W, Low power digital CMOS
Design, 4th
ed. Kluwer Academic Publishers, 2003.
28. Chang T Y and Hsiao M J (1 , “Carry-select adder using single
ripple-carry adder”, Electronics Letters, vol. 4, no. 22, Oct.
pp. 2101-2103.
29. Chen Y, Li H, Koh C K, Sun G, Li J, Xie Y and Roy K (2010),
“Variable-Latency Adder (VL-Adder) designs for low power and
NBTI tolerance”, IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 18, no. 11, Nov., pp. 1621-1624.
30. Chin-Long Wey and Jin-Fu Li (2004 , “Design of reconfigurable array
multipliers and multiplier-accumulators”, In Proc. of IEEE Asia-
Pacific Conference on Circuits and Systems, pp. 37-40.
31. Chip-Hong Chang, Jiangmin Gu and Mingyan Zhang (2004 , “Ultra
low-voltage low-power CMOS 4-2 and 5-2 Compressors for fast
arithmetic circuits”, IEEE Transactions on Circuits and Systems—I:
Regular Papers, vol. 51, no. 10, Oct., pp. 1985-1997.
32. Choi S, Kim G, Yoo H J and Nam B G (2014 , “Hybrid radix-4/-8
truncated multiplier for mobile GPU applications”, Electronics Letters,
vol. 50, no. 23, Jun., pp. 1680-1682.
113
33. Chong K S, Gwee B H and Chang J S (200 , “Low energy 1 -bit
Booth leapfrog array multiplier using dynamic adders”, IET Circuits,
Devices & Systems, vol. 1, no. 2, Apr., pp. 170-174.
34. Chua-Chin Wang, Po-Ming Lee and Chenn-Jung Huang (2002),
“Improved design of C2PL 3-2 compressors for inner product
Processing”, VLSI Design, vol. 14, no.4, Jan., pp. 383–388.
35. Costas Efstathiou, Zaher Owda, and Yiorgos Tsiatouhas (201 , “New
high-speed multi output carry look-ahead adders”, IEEE Transactions
on Circuits and Systems-II: Express Briefs, vol. 60, no. 10, Oct.,
pp. 667-671.
36. Dadda L (1 , “Some schemes for parallel multipliers”, Alta
Frequenza, vol. 34, no. 5, Aug., pp. 349–356.
37. Dan Wang, Maofeng Yang, Wu Cheng, Xuguang Guan, Zhangming
Zhu and Yintang Yang (200 , “Novel low power full adder cells in
1 0nm CMOS technology”, In Proc. of IEEE Conference on Industrial
Electronics and Applications, pp. 430-433.
38. Das A, Mandal S K and Das J K (201 , “High speed square root carry
select adder using MTCMOS D-latch in 4 nm technology”, In Proc. of
International Conference on Electrical, Electronics, Signals,
Communication and Optimization, pp. 1-4.
39. Dash A, Dash S and Mandal S K (2014 , “Design of optimized
Wallace tree multiplier in Cadence”, In Proc. of International
Conference on Microelectronics, Circuits and Systems, pp. 34-38.
40. Davoud Bahrepour and Mohammad Javad Sharifi (201 , “A novel
high speed full adder based on linear threshold gate and its application
to a 4-2 compressor”, Arab J. Sci. Eng., vol. , no. 11, Apr.,
pp. 3041–3050.
114
41. Deepa and Sampath Kumar V (201 , “Analysis of energy efficient
PTL based full Adders using different nano-meter technologies”, In
Proc. of IEEE International Conference on Electronics and
Communication System, pp. 310-315.
42. Dhar K (2014 , “Design of a high speed, low power synchronously
clocked NOR-based JK flip-flop using modified GDI technique in
4 nm technology”, In Proc. of International Conference on Advances
in Computing, Communications and Informatics, pp. 600-606.
43. Dhar K (2014 , “Design of a low power, high speed, energy efficient
full adder using modified GDI and MVT scheme in 45nm
technology”, In Proc. of International Conference on Control,
Instrumentation, Communication and Computational Technologies,
pp. 36-41.
44. Dhar K, Chatterjee A and Chatterjee S (2014 , “Design of an energy
efficient, high speed, low power full subtractor using GDI
technique”, In Proc. of IEEE Students Technology Symposium,
pp. 199-204.
45. Dubey V and Sairam R (2014 , “An Arithmetic and Logic Unit (ALU
optimized for area and power”, In Proc. of IEEE International
Conference on Advanced Computing and Communication
Technologies, pp. 330-334.
46. Fang Tang, Amine Bermak and Zhouye Gu (2012 , “Low power
dynamic logic circuit design using a pseudo dynamic buffer”,
Integration, the VLSI journal, vol. 45 no. 4, Sep., pp. 395-404.
47. Farid Mosh Gelani, Dhamin Al-khalili, and Come Rozon (2012),
“Ultra-low leakage structures for arithmetic circuits using symmetric
and Asymmetric FinFETs”, In Proc. of New Circuits and Systems
Conference, pp. 385-388.
115
48. Fathi A, Azizian S, Hadidi K, Khoei A and Chegani A (2012 , “CMOS
implementation of a fast 4-2 compressor for parallel accumulations”,
In Proc. of the International Symposium on Circuits and Systems,
pp. 1476-1479.
49. Fisher S, Teman A, Vaysman D, Gertsman A, Yadid-Pecht O and Fish
A (200 , “Ultra-low power subthreshold flip-flop design”, In Proc. of
International Symposium on Circuits and Systems, pp. 1573-1576.
50. Foroutan, V, Teheri M, Navi K and Mazreah A (2014 , “Design of two
low power full adder using GDI structure and hybrid CMOS logic
style”, Integration, the VLSI Journal, vol. 4 , no.1, Jan., pp. 48-61.
51. Gahlan N K, Shukla P and Kaur J (2012 , “Implementation of Wallace
tree multiplier using compressor”, International Journal of Computer
Technology and Applications, vol. 3, no. 3, May-June, pp. 1194–1199.
52. Ghobadi N, Majidi R, Mehran M and Afzali-Kusha A (2010 , “Low
power 4-bit full adder cells in subthreshold regime”, In Proc. of Iranian
Conference on Electrical Engineering, pp. 362-367.
53. Goel S, Kumar A and Bayoumi M A (200 , “Design of robust,energy-
efficient full adders for deep-submicrometer design using hybrid-
CMOS logic style”, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol.14, no.12, Dec., pp. 82-94.
54. Gokhale G R and Bahirgonde P D (201 , “Design of Vedic-multiplier
using area-efficient carry select adder”, In Proc. of International
Conference on Advances in Computing, Communications and
Informatics, pp. 576-581.
55. Grover A (201 , “Design of power reversible comparators with
different technologies”, In Proc. of International Conference on
Computational Intelligence, Modeling and Simulation, pp. 193-196.
116
56. Grover A and Grover N (201 “Comparative Analysis: Area-Efficient
carry select adders 1 0 nm Technology”, In Proc. of Asia Modelling
Symposium, pp. 99-102.
57. Gupta J, Grover A, Wadhwa G K and Grover N (201 , “Multipliers
using low power adder cells using 1 0nm technology”, In Proc. of
International Symposium on Computational and Business Intelligence,
pp. 3-6.
58. Gupta A, Malviya U and Kapse V (2012 , “Design of speed, energy
and power efficient reversible logic based Vedic ALU for digital
processors”, In Proc. of International Conference on Engineering,
pp. 1-6.
59. Gupta R, Dhar R, Baishnab K L and Mehedi J (2014 , “Design of
high performance bit Vedic multiplier using compressor”, In Proc. of
International Conference on Advances in Engineering and Technology,
pp. 1-5.
60. Gurumurthy K S and Prahalad M S (2010 , “Fast and power efficient
1 ×1 Array of Array multiplier using Vedic Multiplication”, In Proc.
of International Conference on Microsystems Packaging Assembly and
Circuits Technology, pp. 1-4.
61. Hari O P and Mai A K (2011 , “Low power and area efficient
implementation of N-phase non overlapping clock generator using GDI
technique”, In Proc. of International Conference on Electronics
Computer Technology, pp. 123-127.
62. Howard G M, Mokrian P, Ahmadi M and Miller W C (200 , “Power
and delay analysis of 4:2 compressor cells”, In Proc. of IEEE
International Symposium on Circuits and Systems, pp. 3559-3562.
63. Huddar S R, Rupanagudi S R, Kalpana M and Mohan S (201 , “Novel
high speed Vedic mathematics multiplier using compressors”, In
Proc. of International Multi-Conference on Automation, Computing,
Communication, Control and Compressed Sensing, pp. 465-469.
117
64. Hung Tien Bui, Yuke Wang, Yingtao Jiang (2002 , “Design and
analysis of low-power 10-transistor full adders using novel XOR-
XNOR gates”, IEEE Transactions on Circuits and Systems II: Analog
and Digital Signal Processing, vol. 49, no. 1, Jan., pp. 25-30.
65. Hussin R, Shakaff A Y M, Idris N S Z, Ismail R C and Kamarudin A
(200 , “An efficient modified booth multiplier architecture”, In Proc.
of the International Conference on Electronic Design, pp. 1-4.
66. Jaina D, Sethi K and Panda R (2011 , “Vedic mathematics based
multiply accumulate unit”, In Proc. of the International Conference on
Computational Intelligence and Communication Networks, pp.754-757.
67. Jamshidi V, Fazeli M and Patooghy A (201 , “A low power hybrid
MTJ/CMOS (4-2 compressor for fast arithmetic circuits”, In Proc. of
International Symposium on Computer Architecture and Digital
Systems, pp. 1-6.
68. Jarvinen K and Skytta J (200 , “On parallelization of high-speed
processors for elliptic curve cryptography”, IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol. 16, no. 9, Sep.,
pp. 1162-1175.
69. Jeong Beom Kim and Dong Whee Kim (200 , “Low-power carry
look-ahead adder with multi threshold voltage CMOS technology”, In
Proc. of IEEE International Conference, pp. 537-540.
70. Jhamb M, Garima, Lohani H (201 , “Design, implementation and
performance comparison of multiplier topologies in power-delay
space”, Engineering Science and Technology, an International Journal,
vol. 19, no. 1, Mar., pp. 355-363.
71. Jinesh S, Ramesh P and Thomas J (201 , “Implementation of 4 bit
high speed multiplier for DSP application-based on Vedic
mathematics”, In Proc. of IEEE Region 10 Conference, pp. 1-5.
118
72. Jin-Fa Lin Yin-Tsung Hwang Ming-Hwa Sheu (2012 , “Low Power
10-transistor full adder design based on degenerate pass transistor
logic”, In Proc. of IEEE International Symposium on Circuits and
Systems, pp. 496-499.
73. Kaur H and Prakash N R (201 , “Area-efficient low PDP 8-bit Vedic
multiplier design using compressors”, In Proc. of International
Conference on Recent Advances in Engineering and Computational
Sciences, pp. 1-4.
74. Kayal D, Mostafa P, Dandapat A and Sarkar C K (2014 , “Design of
high performance 8 bit multiplier using Vedic algorithm with
McCMOS technique”, Journal of Signal Processing Systems, vol. ,
no. 1, Jul., pp. 1-9.
75. Khurana S, Grover A and Grover N (201 , “Comparative analysis:
power reversible comparator circuits in 0 nm technology”, In Proc. of
Asia Modeling Symposium, pp. 103-107.
76. Kumar A and Raman A (2010 , “Low power ALU design by ancient
mathematics”, In Proc. of International Conference on Computer and
Automation Engineering, pp. 862-865.
77. Kumar G and Sahoo S K (201 , “Implementation of a high speed
multiplier for high-performance and low power
applications”, International Symposium on VLSI Design and Test,
pp. 1-4.
78. Kunchigi V, Kulkarni L and Kulkarni S (2012 , “High speed and area
efficient Vedic multiplier”, International Conference on Devices,
Circuits and Systems, pp. 360-364.
79. Lee P M, Hsu C H and Hung Y H (200 , “Novel 10-T full adders
realized by GDI structure”, In Proc. of International Symposium on
Integrated Circuits, Singapore, pp. 115-118.
119
80. Li W, Dai Z B, Meng T and Ren Q (200 , “Design and
implementation of a high-speed reconfigurable multiplier”, In Proc. of
International Conference on ASIC, pp. 177-180.
81. Lunchao Wang and Ken Choi (2014 , “A carry look-ahead adder
designed by reversible logic”, In Proc. of ISOCC, pp. 216-217.
82. Magesh Kannan P and Prathyusha K (2011 , “Implementation of low
power RAM in GDI technique with full swing”, In Proc. of
International Conference on Signal Processing, Communication,
Computing and Networking Technologies, pp. 592-597.
83. Maharaja J. S. S. B. K. T, Vedic Mathematics, 1st
ed. Motilal
Banarsidass press, 2001.
84. Manash Chanda, Sankalp Jain, Swapnadip De and Chandan Kumar
Sarkar (201 , “Implementation of sub threshold adiabatic logic for
ultralow-power application”, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 23, no.12, Dec., pp. 2782-2790.
85. Martin Margala and Nelson G Durdle (1 , “Low-power 4-2
compressor circuits”, International Journal of Electronics, vol. 85,
no. 2, pp. 165- 176.
86. Mehta P and Gawali D (200 , “Conventional versus Vedic
mathematical method for hardware implementation of a multiplier”, In
Proc. of International Conference on Advances in Computing, Control,
& Telecommunication Technologies, pp. 640-642.
87. Mhaidat K M and Hamzah A Y (2014 , “A new efficient reduction
scheme to implement tree multiplier on FPGAs”, In Proc. of
International Design and Test and Symposium, pp. 180-184.
88. Mohab Anis, Mohamed Allam and Mohamed Elmasry (2002 , “Impact
of technology scaling on CMOS logic styles”, IEEE Transactions on
Circuits and Systems—II: Analog and Digital Signal Processing,
vol. 49, no. 8, Aug., pp. 577-588.
120
89. Mohanty B K and Patel S K (2014 , “Area-delay-power efficient carry-
select adder”, IEEE Transactions on Circuits and System-I: Regular
Paper, vol. 61, no. 6, Jun., pp. 418-422.
90. Moradi F, Wisland D T, Mahmoodi H, Aunet S, Cao T V and Peiravi
A (200 , “Ultra low power full adder topologies”, In Proc. of IEEE
International Symposium on Circuits and Systems, pp. 3158-3161.
91. Morgenshtein A, Fish A and Wagner I A (2002 , “Gate-Diffusion
Input (GDI) – A power-efficient method for digital combinatorial
circuits”, IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 10, no. 5, Oct., pp. 566-581.
92. Morgenshtein A, Fish A and Wagner I A (2004 , “An efficient
implementation of D-flip-flop using the GDI technique”, In Proc. of
International Symposium on Circuits and Systems, pp. 673-676.
93. Morgenshtein A, Shwartz I and Fish A (2010 , “Gate Diffusion Input
(GDI Logic in standard CMOS nanoscale process”, In Proc. of IEEE
Convention of Electrical and Electronics Engineers, pp. 776-780.
94. Morgenshtein A, Shwartz I and Fish A (2014 , “Full swing Gate
Diffusion Input (GDI) logic – case study for low power CLA adder
design”, Integration, the VLSI Journal, vol. 4 , no. 1, Jan., pp. 62-70.
95. Muhammad K, Somasekhar D and Roy K (1 , “Switching
characteristics of generalized array multiplier architectures and their
applications to low power design”, In Proc. of International Conference
on Computer Design, pp. 230-235.
96. Muralidharan R and Chang C H (201 , “Radix-4 and radix-8 booth
encoded multi-modulus multipliers”, IEEE Transactions on Circuits
and Systems I: Regular Papers, vol. 60, no. 11, Nov., pp. 2940-2952.
97. Naaz S A, Pradeep M N, Bhairannawar S and Halvi S (2014 , “FPGA
implementation of high speed Vedic multiplier using CSLA for parallel
FIR architecture”, International Conference on Devices, Circuits and
Systems, pp. 1-5.
121
98. Nagamatsu N, Tanaka S, Mori J, Noguchi T and Hatanaka H (1990),
“A 1 ns 2x 2-bit CMOS multiplier with an improved parallel
structure”, IEEE Journal of Solid-State Circuits, vol. 25, no. 2, Apr.,
pp. 494-497.
99. Naoghare A A and Sakhare A V (201 , “Review on FFT architecture
for real valued signals using Radix 25 algorithm”, In Proc. of
International Conference on Pervasive Computing, pp. 1-3.
100. Naveen R, Thanushkodi K and Saranya C (201 , “Low power
Wallace multiplier using gate diffusion input based full adder”,
International Journal of Electronics and Communication Engineering
Research, vol. 1, no. 3, Aug., pp.17-22.
101. Nehru K, Shanmugam A and Darmila Thenmozhi G (2012 ,” Design
of low power ALU using T FA and PTL based MUX circuits”, In
Proc. of IEEE-International Conference on Advances In Engineering,
Science And Management, pp. 145-149.
102. Neve A, Schettler H, Ludwig T and Flandre D (2004 , “Power-delay
product minimization in high-performance 64-bit carry-select adders”,
IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 12, no. 3, Mar., pp. 235-244.
103. Nikolaidis S, Pournara1 H and Chatzigeorgiou A (2002 , “Output
waveform evaluation of basic pass transistor structure”, Lecture Notes
in Computer Science, pp. 229–238.
104. Nowka K J and T Galambos T (1 , “Circuit design techniques for a
Giga Hertz integer microprocessor”, In Proc. of IEEE International
Conference on Computer Design, pp. 11–16.
105. Okhalama Bedrij (1 2 , “Carry-Select Adder”, IRE Transactions on
Electronic Computers, vol. EC-11, no. 3, Jun., pp. 340-346.
122
106. Ohsang Kwon and Swartzlander E E (2002 , “A 1 -bit by 16-bit MAC
design using fast : compressor cells”, Journal of VLSI Signal
Processing, vol. 31, no. 2, Jun., pp. 77–89.
107. Oklobdzija V J (1 , “Improving multiplier design by using
improved column tree and optimized final adder in CMOS technology”,
IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 3, no. 2, Jun., pp. 292-30.
108. Paim G, Fonseca M, Costa E and Almeida S (201 , “Power efficient
2-D rounded cosine transform with adder compressors for image
compression”, In Proc. of International Conference on Electronics,
Circuits and Systems, pp. 348-351.
109. Pandey S, Khan A and Sarma R (2014 , “Comparative analysis of
carry select adder using 8T and 10T full adder cells”, In Proc. of
International Conference on Communications and Signal Processing,
pp. 985-989.
110. Paul B C, Soeleman H and Roy K (2001 , “An × sub-threshold
digital CMOS carry save array multiplier”, In Proc. of Solid-State
Circuits Conference, pp. 377-380.
111. Peiman Aliparast, Ziaddin Daie Koozehkanani, Abdolhamid Moallemi
Khiavi, Ghader Karimian and Hossein Balazadeh Bahar (2011 , “A
very high-speed CMOS 4-2 compressor using fully differential current-
mode circuit techniques”, Analog Integr. Circ. Sig. Process., vol. 66,
no. 2, Feb., pp. 235–243.
112. Pishvaie A, Jaberipur G and Jahanian A (201 , “Redesigned CMOS
4;2 compressor for fast binary multipliers”, Canadian Journal of
Electrical and Computer Engineering, vol. 36, no. 3, pp. 111-115.
113. Pradhan M, Panda R and Kumar Sahu S (2011 , “Speed Comparison of
1 x1 Vedic Multipliers”, International Journal of Computer
Applications, vol. 21, no. 6, May, pp. 16–19.
123
114. Prakash R and Kirubaveni S (201 , “Performance evaluation of FFT
processor using conventional and Vedic algorithm”, In Proc. of
International Conference on Emerging Trends in Computing,
Communication and Nanotechnology, pp. 89-94.
115. Prasad K and Parhi K K (2001 , “Low-power 4-2 and 5-2
compressors”, In Proc. of Asilomar Conference on Signals, Systems
and Computers, pp. 129-133.
116. Prasad Y B, Chokkakula G, Reddy P S and Samhitha N R (2014),
“Design of low power and high speed modified carry select adder for
1 bit Vedic Multiplier”, In Proc. of International Conference on
Information Communication and Embedded Systems, pp. 1-6.
117. Purohit S and Margala M (2012 , “Investigating the impact of logic
and circuit implementation for full adder performance”, IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20,
no. 7, Jul., pp. 1327-1331.
118. Pushpangadan R, Sukumaran V, Innocent R, Sasikumar D and Sundar
V (200 , “High speed Vedic multiplier for digital signal processors”,
IETE Journal of Research, vol. 55, no. 6, Nov.-Dec., pp. 282-286,.
119. Quan G, Davis J P, Devarkal S and Buell D A (200 , “High level
synthesis for large bit width multipliers on FPGAs: A case study”, In
Proc. of the International Conference on Hardware/Software Co
Design and System Synthesis, pp. 213-218.
120. Quan S, Qiang Q and Wey C L (200 , “A novel reconfigurable
architecture of low power unsigned multiplier for digital signal
processing”, In Proc. of the International Symposium on Circuits and
Systems, pp. 3327-3330.
121. Radhakrishnan D and Preethy A P (2000 , “Low power CMOS pass
logic 4-2 compressor for high-speed multiplication”, In Proc. of IEEE
Midwest Symposium on Circuits and Systems, pp. 1296-1298.
124
122. Rakshith T R and Saligram R (201 , “Design of high speed low power
multiplier using reversible logic: A Vedic mathematical approach”, In
Proc. of International Conference on Circuits, Power and Computing
Technologies, pp. 775-781.
123. Ramalatha M and Thanushkodi K (200 , “A novel time and energy
efficient cubing circuit using Vedic mathematics for finite field
arithmetic”, In Proc. of the International Conference on Advances in
Recent Technologies in Communication and Computing, pp. 873-875.
124. Ramana Murthy G, Senthil Pari C, Velraj Kumar P and Lim T S
(2012 , “A new -T multiplexer based full adder for low power and
leakage current optimisation”, IEICE Electronics Express, vol. 9,
no. 17, Sep., pp. 1434-1441.
125. Ramkumar B and Kittur H M (2012 , “Low-power and area-efficient
carry select adder”, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 20, no. 2, Feb., pp. 371-375.
126. Rao M J and Dubey S (2012 , “A high speed and area efficient Booth
recoded Wallace tree multiplier for fast arithmetic circuits”, In Proc. of
Asia Pacific Conference on Post Graduate Research in
Microelectronics and Electronics, pp. 220-223.
127. Ravali B, Micheal Priyanka M and Ravi T (201 , “Optimized
reversible logic design for Vedic multiplier”, In Proc. of International
Conference on Control, Instrumentation, Communication and
Computational Technologies, pp. 127-133.
128. Ravi N, Subbaiah Y, Prasad T J and Rao T S (2011 , “A novel low
power, low area array multiplier design for DSP applications”, In Proc.
of International Conference on Signal Processing, Communication,
Computing and Networking Technologies, pp. 254-257.
129. Reddy B N M, Sheshagiri H N, Vijayakumar B R and Santhala S
(2014 , “Implementation of low Power -Bit multiplier using gate
diffusion input logic”, In Proc. of IEEE International Conference on
Computational Science and Engineering, pp. 1868-1871.
125
130. Ruiz G A (1 , “Compact four bit carry look CMOS adder in multi
output DCVS logic”, Electronics Letters, vol. 2, no. 1 , Aug.,
pp. 1556-1557.
131. Saberkari A, Shokouhi S B, Kiani A and Poorahangaryan F (200 , “A
novel low power static frequency divider based on the GDI
technique”, In Proc. of Canadian Conference on Electrical and
Computer Engineering, pp. 67-70.
132. Saha P, Banerjee A, Bhattacharyya P and A Dandapat A (2011 , “High
speed ASIC design of complex multiplier using Vedic
mathematics”, In Proc. of IEEE Students Technology Symposium,
pp. 237-241.
133. Sahoo S K and Shekhar C (2011 , “Delay optimized array multiplier
for signal and image processing”, In Proc. of International Conference
on Image Information Processing, pp. 1-4.
134. Sahoo S R and Mahapatra K K (2012 , “Design of low power and high
speed ripple carry adder using modified feed through logic”, In Proc.
of International Conference on Communications, Devices and
Intelligent Systems, pp. 377-380.
135. Sahu R and Subudhi A K (201 , “An area optimized carry select
adder”, In Proc. of International Conference on Power,
Communication and Information Technology, pp. 589-594.
136. Saligram R and Rakshith T R (201 , “Optimized reversible Vedic
multipliers for high speed low power operations”, In Proc. of IEEE
Conference on Information and Communication Technologies,
pp. 809-814.
137. Saradindu Panda, Banerjee A, Maji B and Mukhopadhyay A K (2012),
“Power and delay comparison in between different types of full adder
circuits”, International Journal of Advanced Research in Electrical,
Electronics and Instrumentation Engineering, vol. 1, no. 3, Sep.,
pp. 168-172.
126
138. Saxena P (201 , “Design of low power and high speed carry select
adder using Brent Kung adder”, In Proc. of International Conference
on VLSI Systems, Architecture, Technology and Applications, pp. 1-6.
139. Schiavon T, Paim G, Fonseca M, Costa E and Almeida S (2016),
“Exploiting adder compressors for power-efficient 2-D approximate
DCT realization”, In Proc. of International Symposium on Circuits and
Systems, pp. 383-386.
140. Senthil Sivakumar M, Arockia Jayadhas S, Arputharaj T and
Banupriya M (201 , “4-bit Manchester carry look-ahead adder design
using MT-CMOS domino logic”, In Proc. of International Conference
on Information Science, Computing and Telecommunications,
pp. 15-18.
141. Senthilpari C (2011 , “A low power and high performance radix-4
multiplier design using pass transistor logic technique” IETE Journal
of Research, vol. 57, no. 2, pp. 149-155.
142. Sethi K and Panda R (201 , “Multiplier less high speed squaring
circuit for binary numbers”, International Journal of Electronics,
vol. 102, no. 3, Mar., pp. 433-443.
143. Shahzad Asif and Mark Vesterbacka (2012 , “Performance analysis of
radix-4 adders”, Integration, the VLSI Journal, vol. 4 , no. 2, Mar.,
pp. 111-120.
144. Shams A M, Darwish D K and Bayoumi M A (2002 , “Performance
analysis of low power 1-bit CMOS full adder cells”, IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 10,
no.1, Feb., pp. 20-29.
145. Sharma A and Sharma P (2014 , “Area and power efficient 4-bit
comparator design by using 1-bit full adder module”, In Proc. of
International Conference on Parallel, Distributed and Grid Computing
pp. 1-6.
127
146. Shekhawat V, Sharma T and Sharma K G (2014 , “2-Bit magnitude
comparator using GDI technique”, In Proc. of International Conference
on Recent Advances and Innovations in Engineering, pp. 1-5.
147. Shen-Fu Hsiao, Ming-Roun Jiang and Jia-Sien Yeh (1 , “Design of
high-speed low-power 3-2 counter and 4-2 compressor for fast
multipliers”, Electronics Letters, vol. 4, no. 4, Feb., pp. 341-343.
148. Shen-Fu Hsiao, Ming-Yu Tsai and Chia-Sheng Wen (2010 , “Low
area/power synthesis using hybrid pass transistor/CMOS logic cells in
standard cell-based design environment”, IEEE Transactions on
Circuits And Systems—II: Express Briefs, vol. 57, no. 1, Jan., pp. 21-25.
149. Shi J, Jing G, Di Z and Yang S (2011 , “The design and
implementation of reconfigurable multiplier with high flexibility”, In
Proc. of the International Conference on Electronics, Communications
and Control, pp. 1095-1098,
150. Shinde K D and Nidagundi J C (2014 , “Design of fast and efficient 1-
bit full adder and its performance analysis”, In Proc. of International
Conference on Control, Instrumentation, Communication and
Computational Technologies, pp.1275-1279.
151. Shrivas J, Akashe S and Tiwari N (2012 , “Design and performance
analysis of 1 bit full adder using GDI technique in nanometer era”, In
Proc. of World Congress on Information and Communication
Technologies, pp. 822-825.
152. Shubin V V (2010 , “Analysis and comparison of ripple carry full
adders by speed”, In Proc. of International Conference and Seminar on
Micro/Nanotechnologies and Electron Devices, pp. 132-135.
153. Singh H and Kumar R (2014 , “10-T Full subtraction Logic Using GDI
Technique”, In Proc. of International Conference on Computational
Intelligence and Communication Networks, pp. 956-960.
128
154. Singh S and Sasamal T N (201 , “Design of Vedic multiplier using
adiabatic logic," In Proc. of International Conference on Futuristic
Trends on Computational Analysis and Knowledge Management,
pp. 438-441.
155. Soundharya M and Arunkumar R (201 , “GDI based area delay power
efficient carry select adder”, In Proc. of International Conference on
Green Engineering and Technologies, pp. 1-5.
156. Stefania Perri and Pasquale Corsonello (2012 , “New methodology for
the design of efficient binary addition circuits in QCA”, IEEE
Transactions on Nanotechnology, vol. 11, no. 6, Nov., pp. 1192-1200.
157. Subhendu Kumar Sahoo and Chandra Shekhar (200 , “Design and
analysis of a compact fast parallel multiplier for high speed DSP
applications using novel partial product generator and 4:2 compressor”,
International Journal of Electronics, vol. 95, no. 2, Feb., pp.139–157.
158. Sudha S and Marimuthu C N (2014 , “Design of area delay-power
efficient adaptive filter using Wallace tree multiplier”, International
Journal of Scientific Engineering and Research, vol. 2, no. 4, Apr.,
pp. 121–125.
159. Swami N, Arora N and Singh B P (2011 , “Low Power subthreshold D
flip flop”, In Proc. of International Conference on Devices and
Communications, pp. 1-4.
160. Thakre L P, Balpande S, Akare U and Lande S (2010 , “Performance
evaluation and synthesis of multiplier used in FFT operation using
conventional and Vedic algorithms”, In Proc. of International
Conference on Emerging Trends in Engineering and Technology,
pp. 614-619.
161. Tiwari H D, Gankhuyag G, Chan Mo Kim and Yong Beom Cho
(200 , “Multiplier design based on ancient Indian Vedic
mathematics”, In Proc. of International SoC Design Conference,
pp. 65-68.
129
162. Tsoumanis K, Axelos N, Moschopoulos N, Zervakis G and Pekmestzi
K (201 , “Pre-Encoded Multipliers Based on Non-Redundant Radix-4
Signed-Digit Encoding”, IEEE Transactions on Computers, vol. ,
no. 2, Feb., pp. 670-676.
163. Tyagi A (1 , “A reduced-area scheme for carry-select adders”,
IEEE Transactions on Computers, vol. 42, no. 10, Oct., pp. 1163-1170.
164. Uma R and Dhavachelvan P (2012 , “Modified gate diffusion input
technique: a new technique for enhancing performance in full adder
circuits”, In Proc. of International Conference on Communication,
Computing and Security, pp. 74-81.
165. Usha S and Ravi T (201 , “Design of 4-bit ripple carry adder using
hybrid T full adder”, In Proc. of International Conference on Circuit,
Power and Computing Technologies, pp. 1-8.
166. Vatanjou A A, Ytterdal T and Aunet S (201 , “Energy efficient
sub/near-threshold ripple-carry adder in standard nm CMOS”, In
Proc. of Asia Symposium on Quality Electronic Design, pp. 7-12.
167. Veeramachaneni S, Krishna K M, Avinash L, Puppala S R and
Srinivas M B (200 , “Novel Architectures for high-speed and low-
power 3-2, 4-2 and 5-2 compressors”, In Proc. of International
Conference on VLSI and Embedded Systems, pp. 324-329.
168. Wariya S, Nagaria R and Tiwari S (2012 , “Performance analysis of
high speed hybrid CMOS full adder circuits for low voltage VLSI
design”, VLSI Design, vol. 2012, Jan., pp. 1–18.
169. Wallace C (1 4 , “A suggestion for a fast multiplier”, IEEE
Transactions on Electronic Computers, vol. EC-13, pp. 14–17.
170. Waters R S and Swartzlander E E (2010 , “A Reduced Complexity
Wallace Multiplier Reduction”, IEEE Transactions on Computers,
vol. 59, no. 8, Aug., pp. 1134-1137.
171. Weignberger A (1 1 , “4:2 carry-save adder module”, IBM Technical
Disclosure Bulletin, vol. 23, pp.1-4.
130
172. Weste N H E and Harris D, CMOS VLSI Design, 2nd
ed, Pearson
Education, 2005.
173. Xu-guang Sun, Zhi-gang Mao and Feng-chang Lai (2002 , “A 4 bit
parallel CMOS adder for high performance processors”, In Proc. of the
IEEE Asia-Pacific Conference on ASIC, pp. 205–208.
174. Yagain D and Vijayan K A (201 , “FIR filter design based on
retiming automation using VLSI design metrics”, In Proc. of
International Conference on Technology, Informatics, Management,
Engineering and Environment, pp. 17-22.
175. Yazhini G and Rajendiran M (201 , “Low power-area efficient design
of 1 bit full adder”, In Proc. of International Conference on Computing
for Sustainable Global Development, pp. 1679-1683.
176. Yen-Mou Huang and Kuo J B (2000 , “A high-speed conditional carry
select adder circuit with a successively incremented carry number
block structure for low-voltage VLSI implementation”, IEEE
Transactions on Circuits and Systems II: Analog and Digital Signal
Processing, vol. 47, no. 10, Oct., pp. 1074-1079.
177. Yong Surk Lee, Joh P, Jae Hee You and Kyu Tae Park (199 , “Fast
and gate-count efficient arithmetic logic unit”, Electronics Letters,
vol. 32, no. 23, Nov., pp. 2126-2127.
178. Youngjoon Kim and Lee-Sup Kim (2001 , “ 4-bit carry-select adder
with reduced area”, Electronics Letters, vol. , no. 10, May,
pp. 614-615.
179. Yuan S C (200 , “4-2 compressor of fast booth multiplier for high-
speed RISC processor”, International Journal of Electronics, vol. 4,
no. 9, Sep., pp. 869–875.
180. Zakaria Z and Abbasi S A (201 , “Optimized multiplier based upon
input LUTs and Vedic mathematics”, World Academic of Science,
Engineering and Technology, vol. 7, no.1, Jan., pp. 26-30.
131
181. Zhan Yu, Wasserman L and Willson A N (2000 , “A painless way to
reduce power dissipation by over 18% in Booth-encoded carry-save
array multipliers for DSP”, In Proc. of IEEE workshop on Signal
Processing Systems, pp. 571-580.
132
LIST OF PUBLICATIONS
Journals:
1. Shoba Mohan and Nakkeeran Rangaswamy, “An improved
implementation of hierarchy array multiplier using CslA adder and full
swing GDI logic”, International Journal of Computer Aided
Engineering and Technology (Inderscience), Accepted.
2. Shoba Mohan and Nakkeeran Rangaswamy, “Energy and area efficient
hierarchy multiplier architecture based on Vedic mathematics and GDI
logic”, Engineering Science and Technology, an International
Journal (Elsevier), in press.
3. Shoba Mohan and Nakkeeran Rangaswamy, “GDI based full adders
for energy efficient arithmetic applications”, Engineering Science and
Technology, an International Journal (Elsevier), vol. 19, no.1,
pp. 485-496, March 2016.
4. Shoba Mohan and Nakkeeran Rangaswamy, “Implementation of Vedic
multiplier using GDI logic”, International Journal of Applied
Engineering Research (Scopus Indexed), vol. 10, no. 1, pp. 244-
247, March 2015.
5. Shoba Mohan and Nakkeeran Rangaswamy, “Design of high speed
multiplier using Vedic mathematics”, European Journal of Scientific
Research (Scopus Indexed), vol. 129, no. 1 pp. 6-15, February 2015.
133
Conferences:
1. Shoba Mohan and Nakkeeran Rangaswamy, “An improved
implementation of array multiplier using full swing GDI logic gates,
IEEE International Conference on Innovations in Information
Embedded and Communication Systems, Tamilnadu, India, March
16-18, 2016.
2. Shoba Mohan and Nakkeeran Rangaswamy, “An implementation of
CLA adder with minimum area and lesser PDP using full swing GDI
logic gates, IEEE International Conference on Electronics and
Communication Systems, Tamilnadu, India, February 25-26, 2016.
3. Shoba Mohan and Nakkeeran Rangaswamy “Design of ripple carry
adder using GDI logic”, Springer International Conference on Soft
Computing Systems, Tamilnadu, India, April 21-22, 2015.
4. Shoba Mohan and Nakkeeran Rangaswamy “Performance analysis of 1
bit full adder using GDI Logic”, IEEE International Conference on
Information, Communication and Embedded Systems, Tamilnadu,
India, February 27-28, 2014.
5. Shoba Mohan and Nakkeeran Rangaswamy “Gate diffusion input
based primitive cells for full swing logic”, International Conference
on Green Technology Concepts for bridging the digital divide
using ICT, Puducherry, India, July 5-6, 2013.
List of Papers Communicated to Journal:
1. Shoba Mohan and Nakkeeran Rangaswamy, “Energy and area efficient
Vedic multiplier using full swing GDI logic”, International Journal of
Electronics (Taylor and Francis).
2. Shoba Mohan and Nakkeeran Rangaswamy, “Area and energy efficient
4-2 compressor design for tree multiplier implementation”,
Proceedings of the National Academy of Sciences, India Section A:
Physical Sciences (Springer).
134
VITAE
Mrs. M. Shoba was born in Tamilnadu, India in 1986. He received
B.E degree in Electronics and Communication Engineering and M.E degree in
VLSI design from Anna University, Tamilnadu, India in 2007 and 2009,
respectively.
She has worked as a Lecturer in Dr. Mahalingam College of
Engineering and Technology from 2009 to 2010, Assistant Professor at
Dhanalakshmi Srinivasan Engineering College, from 2010 to 2012. She has
been awarded Junior Research Fellowship under National Eligibility Test
(NET) from the University Grants Commission (UGC), Government of India.
She has published around 10 papers in International Journals and
International Conferences. She is a life member of ISTE and student member
of IEEE and IEICE. Her current research interests are in the areas of design
and implementation of energy efficient digital hardware architecture for low
battery operated devices.