Full Custom Layout Design and FPGA Implementation of an ... · In this thesis, a 16-bit carry select adder has been presented using modified XOR based full adder to reduce circuit

i

Full Custom Layout Design and FPGA

Implementation of an XOR Based 16-Bit

Carry Select Adder for Area, Delay and

Power Minimization

A. N. M. HOSSAIN

Department of Electrical and Electronic Engineering

Dhaka University of Engineering & Technology, Gazipur

May 2019

ii

Full Custom Layout Design and FPGA

Implementation of an XOR Based 16-Bit

Carry Select Adder for Area, Delay and

Power Minimization

Submitted to the Department of Electrical and Electronic Engineering,

Dhaka University of Engineering & Technology, Gazipur in partial

fulfillment of the requirements for the award of degree of

Master of Science in

Electrical and Electronic Engineering

by

A. N. M. HOSSAIN

Student No. 112229-P

under the supervision of

Prof. Dr. Md. Anwarul Abedin

Professor, Dept. of EEE



May 2019

iii

The thesis titled “Full Custom Layout Design and FPGA Implementation of an XOR Based

16-Bit Carry Select Adder for Area, Delay and Power Minimization” submitted by

A. N. M. HOSSAIN, Student ID: 112229-P, Session: 2011-2012 has been accepted as

satisfactory in partial fulfillment of the requirement for the degree of Master of Science in Electrical and Electronic Engineering on May 13, 2019.

Board of Examiners

………………….……………

Dr. Md. Anwarul Abedin Chairman

Professor (Supervisor)



………………………….……

Dr. Md. Sharafat Hossain Member

Professor and Head (Ex-officio)



………………………….…

Dr. Md. Saifuddin Faruk Member

Professor



………………………..

Dr. Md. Arifur Rahman Member

Assistant Professor



……………….……………….

Dr. Syed Iftekhar Ali Member

Professor (External)

Department of Electrical & Electronic Engineering

Islamic University of Technology, Gazipur

http://www.duet.ac.bd/dr-md-saifuddin-faruk/

http://www.duet.ac.bd/dr-md-raju-ahmed/

iv

Declaration

I declare that this thesis is my own work and has not been submitted in any form for another

degree or diploma at any university or other institute of tertiary education. Information

derived from the published and unpublished work of others has been acknowledged in the

text and a list of references is given.

Signature of the candidate

A. N. M. Hossain Date: 13/05/2019

(Student ID: 112229-P)

v

Dedication

To my family

vi

Acknowledgements

First of all, I thank the Almighty, who gave me the opportunity and strength to carry out this

research work.

I would like to express my sincere gratitude and profound indebtedness to my supervisor

Prof. Dr. Md. Anwarul Abedin for constant guidance, insightful advice, helpful criticism,

valuable suggestions, commendable support and endless patience towards the completion of

this thesis. I feel very proud to have worked with him. Without his inspiring enthusiasm and

encouragement this work could not have been completed.

I am deeply indebted and grateful to Dr. Md. Sharafat Hossain, Professor and Head,

Department of EEE, DUET, Gazipur, for his supports throughout thesis work.

I would like to thank the members of my thesis examination committee, Dr. Md. Saifuddin

Faruk, Dr. Md. Arifur Rahman, Dr. Syed Iftekhar Ali for their constructive suggestion on

future improvement of my work.

I thank all my teachers and staffs at the Department of EEE, DUET, Gazipur for their support

and encouragement.

I wish to express my gratitude to DUET, Gazipur for providing an excellent environment for

research. The support I have received from DUET, Gazipur is gratefully acknowledged.

I would like to express my most sincere gratitude to my family, my friends and well-wishers

who are taking lot of pains for progress in my life and for their sacrifices, blessings and

constant prayers for my advancement.

Finally, last but not least, I am also thankful to those, who have directly or indirectly helped

me and encouraged me to complete my thesis. I feel sorry for not able to express my

appreciation to each of my well-wishers and ask forgiveness for my improper behavior with

anyone who was intending to help me.



http://www.duet.ac.bd/dr-md-raju-ahmed/

vii

Abstract

Adders are most widely used in different types of processors and other digital circuits. Low

power and area efficient high-speed circuits are most substantial area in the research of VLSI

design. The carry select adder is one of the fast adders which has less area and reduced power

consumption. In this thesis, a 16-bit carry select adder has been presented using modified

XOR based full adder to reduce circuit complexity, area and delay. The modified full adder

design requires only two XOR gates and one multiplexer. The modified 16-bit carry select

adder gives better result than conventional carry select adder with respect to area, power

consumption and delay.

The pass transistor based 2×1 MUX is used in semi custom and full custom design. Here only

6 transistors are needed in place of 20 used in conventional 2×1 MUX. The full custom

2×1 MUX design has 85.12% area, 94.07% power and 70% total no MOSFET reduction over

conventional design where as it has 92% area and 84.89% power reduction over semi custom

design.

The XOR based full custom 1-bit full adder design has 60.4% area, 28.4% power, 54.5%

delay and 40% total no MOSFET reduction over conventional design where as it has 78.3%

area, 50.3% power and 50.8% delay reduction over semi custom design. To design a 16-bit

Carry Select Adder (CSLA), 4-bit and 8-bit CSLA have been designed first. For XOR based

16-bit CSLA the full custom design has 84.3% area, 70.6% power and 56.5% delay reduction

over conventional design where as it has 87.0% area, 20.1% power and 55.2% delay

reduction over semi custom design.

To implement the CSLA in FPGA, the Verilog code of 1-bit adder, 4-bit CSLA, 8-bit CSLA

and 16-bit CSLA have been written. These codes are then simulated in Modelsim software to

check the functionality of the design. When the simulation results are ok then the synthesis

and power analysis is done in Quartus II software. Finally, The 1-bit adder , 4-bit CSLA and

8-bit CSLA in Altera DE2-115 FPGA board. Due to pin constraints up to 8-bit CSLA have

been implemented.

viii

Table of Contents

Page No.

Declaration……………………………………………………………………… viii

Dedication………………………………………………………………………. viii

Acknowledgement………………………………….…………………………... viii

Abstract…………………………………………………………………………. viii

List of Figures……………………………………………………………........... viii

List of Tables…………………………………………………………………… viii

Chapter 1

Introduction

1.1 Introduction…………………………………………………………………. 1

1.2 Literature Review ………………………………………………………….. 2

1.3 Objective of Thesis…………………………………………………………. 3

1.4 Organization of Thesis……………………………………………………… 3

Chapter 2

Adder Topologies

2.1 Ripple Carry Adder…………………………………………………………. 4

2.2 Carry Look-ahead Adder…………………………………………………… 5

2.3 Carry Save Adder…………………………………………………………… 7

2.4 Carry Skip Adder…………………………………………………………… 8

2.5 Carry Select Adder………………………………………………………….. 9

2.5.1 Uniform sized CSLA…………………………………………………. 10

2.5.2 Variable sized CSLA………………………………………………..... 11

Chapter 3

Semi Custom Design

3.1 Conventional 2×1 Multiplexer………………………………………………

12

3.2 Pass Transistor Based 2×1 Multiplexer…………………………………….. 15

3.3 Conventional Full Adder…………………………………………………… 17

ix

3.4 XOR Based Full Adder …………………………………………………….. 19

3.5 Carry Select Adder (CSLA)………………………………………………… 21

3.5.1. XOR Based 4-Bit CSLA…….………………………………………… 21

3.5.2. XOR Based 8-Bit CSLA …………………...…………………………. 23

3.5.3. XOR Based 16-Bit CSLA….………………...………………………... 25

Chapter 4

Full Custom Design

4.1 Inverter or NOT Gate…………………………………………..………….... 29

4.2 Two-input AND Gate…………..……….…………….……………………. 30

4.3 Two-input XOR Gate…………...………………..…………………………. 32

4.4 2×1 MUX……….……………………………….………………………….. 34

4.5 XOR Based Full Adder………………………………………………..……. 36

4.6 XOR Based 4-bit CSLA……………………………………..…………....... 37

4.7 XOR Based 8-bit CSLA………………………………………….……….... 39

4.8 XOR Based 16-bit CSLA…………………………...……..……………….. 40

Chapter 5

Performance Analysis

5.1 Comparison of Different 2 ×1 MUX……………………………………….. 42

5.2 Comparison of Different Full Adder……………………………………….. 44

5.3 Comparison of 4-bit CSLA…………………………………………………. 46

5.4 Comparison of 8-bit CSLA…………………………………………………. 48

5.5 Comparison of 16-bit CSLA………………………………………………... 50

5.6 Comparison with Other Work………………………………………………. 52

Chapter 6

FPGA Implementation of CSLA

6.1 Introduction to FPGA ……………………………………………………… 53

6.2 Altera DE2 -115 FPGA Board ……………………………………………... 54

6.3 FPGA Implementation of Conventional 1-bit Full Adder………………….. 55

x

6.4 FPGA Implementation of 4-bit CSLA……………………………………… 58

6.5 FPGA Implementation of 8-bit CSLA ……………………………………... 62

6.6 FPGA Implementation of 16-bit CSLA ……………………………………. 67

Chapter 7

Conclusion and Future Recommendation

7.1 Conclusion………………………………………………………………….. 72

7.2 Future Recommendation …………………………………………………… 72

References 73

xi

List of Figures

Fig. 2.1 Ripple carry adder…………………………………………………….. 5

Fig. 2.2 Carry look-ahead adder……………………………………………...... 7

Fig. 2.3 Carry save adder……………………………………………………… 8

Fig. 2.4 Carry skip adder………………………………………………………. 9

Fig. 2.5 Carry select adder…………………………………………………….. 10

Fig. 2.6 Uniform sized 16-bit carry select adder………………………………. 10

Fig. 2.7 Variable sized 16-bit carry select adder…………………………......... 11

Fig. 3.1 Schmatic circuit of 2×1 Multiplexer………………………………...... 13

Fig. 3.2 Layout of Conventional 2×1 Multiplexer…………………………..... 13

Fig. 3.3 Input-Output wave shapes of conventional 2×1 MUX……………….. 14

Fig. 3.4 Pass transistor based 2×1 MUX………………………………………. 15

Fig. 3.5 Layout of pass transistor based 2×1 MUX…………………………… 15

Fig. 3.6 Input-Output wave shapes of pass transistor based 2×1 MUX……...... 16

Fig. 3.7 Conventional 1-bit full adder…………………………………………. 17

Fig. 3.8 Layout of the conventional 1-bit full adder…………………………... 17

Fig. 3.9 Input-Output wave shapes of conventional 1-bit full adder………...... 18

Fig. 3.10 XOR based 1-bit full adder…………………………………………… 19

Fig. 3.11 Layout of XOR based 1-bit full adder………………………………... 19

Fig. 3.12 Input-Output wave shapes of XOR based 1-bit full adder……………. 20

Fig. 3.13 XOR based 4-bit CSLA………………………………………………. 21

Fig. 3.14 Layout of the XOR based 4-bit CSLA………………………………... 22

Fig. 3.15 Input-Output wave shapes of XOR based 4-bit CSLA………………. 22

Fig. 3.16 XOR based 8-bit CSLA……………………………………………..... 23

Fig. 3.17 Layout of XOR based 8-bit CSLA…………………………………… 24

Fig. 3.18 Input-Output wave shapes of XOR based 8-bit CSLA……………...... 24

Fig. 3.19 XOR based 16-bit CSLA……………………………………………... 25

Fig. 3.20 Layout of XOR based 16-bit CSLA………………………………….. 26

Fig. 3.21 Input-Output wave shapes of XOR based 16-bit CSLA……………… 26

Fig. 4.1 VLSI design flow……………………………………………………... 28

Fig. 4.2 Full custom layout of NOT gate……………………………………… 29

Fig. 4.3 Input-Output wave shapes of NOT gate……………………………… 29

Fig. 4.4 2-input AND gate symbol…………………………………………...... 30

xii

Fig. 4.5 Full custom layout of 2-input AND gate……………………………... 30

Fig. 4.6 Input-Output wave shapes of 2-input AND gate……………………... 31

Fig. 4.7 Schematic design of 2-input XOR gate……………………………..... 32

Fig. 4.8 Full custom layout of 2-input XOR gate…………………………….. 32

Fig. 4.9 Input-Output wave shapes of 2-input XOR gate……………………... 33

Fig. 4.10 Schematic circuit of 2×1 MUX………………………………………. 34

Fig. 4.11 Full custom layout of 2×1 MUX……………………………………... 34

Fig. 4.12 Input-Output wave shapes of the 2×1 MUX………………………...... 35

Fig. 4.13 XOR based 1-bit full adder ………………………………………...... 36

Fig. 4.14 Full Custom Layout of a XOR Based 1-bit Full Adder……………… 36

Fig. 4.15 Input-Output wave shapes of XOR based 1-bit full adder…………… 36

Fig. 4.16 Full custom layout of XOR based 4-bit CSLA………………………. 37


Fig. 4.18 Full custom layout of XOR based 8-bit CSLA……………………...... 39


Fig. 4.20 Full custom layout of XOR based 16-bit CSLA……………………… 40

Fig. 4.21 Input-Output wave shapes of XOR based 16-bit CSLA……………… 41

Fig. 5.1 Area, Power and Delay comparison of 2×1 MUX…………………… 43

Fig. 5.2 Area, Power and Delay comparison of 1-bit Full Adder……………... 45

Fig. 5.3 Area, Power and Delay comparison of 4-bit CSLA………………….. 47

Fig. 5.4 Area, Power and Delay comparison of 8-bit CSLA………………….. 49

Fig. 5.5 Area, Power and Delay comparison of 16-bit CSLA………………… 51

Fig. 6.1 FPGA Architecture…………………………………………………… 53

Fig 6.2 Altera DE2 -115 FPGA Board………………………………………... 54

Fig. 6.3 Block diagram of the Altera DE2 -115 FPGA Board………………… 55

Fig. 6.4 Verilog code of 1-bit full adder……………………………………..... 55

Fig. 6.5 Simulation waveform results of conventional 1-bit full adder……….. 56

Fig. 6.6 Synthesis summary of the conventional 1-bit full adder……………... 56

Fig. 6.7 Power analysis of the conventional 1-bit full adder………………….. 57

Fig. 6.8 Implementation of the conventional 1-bit full adder in FPGA……...... 57

Fig. 6.9 Verilog code of 4-bit CSLA…………………………………………... 58

Fig. 6.10 Simulation waveform results of 4-bit CSLA (binary)………………... 59

Fig. 6.11 Simulation waveform results of 4-bit CSLA (decimal)………………. 59

xiii

Fig. 6.12 Synthesis summary of 4-bit CSLA…………………………………… 60

Fig. 6.13 Power analysis of 4-bit CSLA………………………………………... 60

Fig. 6.14 FPGA implementation of 4-bit CSLA………………………………... 61

Fig. 6.15 Verilog code of 8-bit CSLA…………………………………………... 62

Fig. 6.16 Simulation waveform results of 8-bit CSLA (Binary)……………….. 64

Fig. 6.17 Simulation waveform results of 8-bit CSLA (Decimal)……………… 64

Fig. 6.18 Synthesis summary of 8-bit CSLA…………………………………… 65

Fig. 6.19 Power analysis of 8-bit CSLA………………………………………... 65

Fig. 6.20 FPGA implementation result of 8-bit CSLA………………………..... 66

Fig. 6.21 Verilog code of 16-bit CSLA…………………………………………. 67

Fig. 6.22 Simulation waveform results of 16-bit CSLA (Binary)………………. 69

Fig. 6.23 Simulation waveform results of 16-bit CSLA (Decimal)…………...... 69

Fig. 6.24 Synthesis summary of 16-bit CSLA………………………………….. 70

Fig. 6.25 Power analysis of 16-bit CSLA……………………………………..... 70

xiv

List of Tables

Table 3.1 Area, Delay and Power Dissipation of the Conventional 2×1 MUX………… 14

Table 3.2 Area, Delay and Power Dissipation of the pass transistor based 2×1 MUX…. 16

Table 3.3 Area, Delay and Power Dissipation of conventional 1-bit full adder……… 18

Table 3.4 Area, Delay and Power Dissipation of XOR based 1-bit full adder…………. 20

Table 3.5 Area, Delay and Power Dissipation of XOR based 4-bit CSLA……............... 23

Table 3.6 Area, Delay and Power Dissipation of XOR based 8-bit CSLA……………... 25

Table 3.7 Area, Delay and Power Dissipation of XOR based 16-bit CSLA……………. 27

Table 4.1 Area, Delay and Power Dissipation of NOT gate…………………................ 30

Table 4.2 Area, Delay and Power Dissipation of 2-input AND gate…………................ 31

Table 4.3 Area, Delay and Power Dissipation of 2-input XOR gate…………................ 33

Table 4.4 Area, Delay and Power Dissipation of the 2×1 MUX……………………….. 35

Table 4.5 Area, Delay and Power Dissipation of the 1-bit full adder…………............... 37

Table 4.6 Area, Delay and Power Dissipation of the XOR based 4-bit CSLA…………. 38

Table 4.7 Area, Delay and Power Dissipation of the XOR based 8-bit CSLA…………. 40

Table 4.8 Area, Delay and Power Dissipation of the XOR based 16-bit CSLA……… 41

Table 5.1 Performance Analysis of Conventional, Semi and Full Custom 2×1 MUX..... 42

Table 5.2 Performance analysis of Conventional, Semi and Full Custom 1-bit Full

Adder…………………………………………………………………………. 44

Table 5.3 Performance analysis of Conventional, Semi and Full Custom 4-bit CSLA… 46

Table 5.4 Performance analysis of Conventional, Semi and Full Custom 8-bit CSLA… 48

Table 5.5 Performance analysis of Conventional, Semi and Full Custom 16-bit CSLA.. 50

Table 5.6 Performance analysis of Comparison with Other Work……………………... 52

Table 6.1 Summary of the all results obtained from the synthesis and power analysis

using Quartus II software…………………………………………………….. 70

1

Chapter 1

Introduction

1.1 Introduction

Adder are one of the widely used digital components in digital integrated circuit design. It has

special significance in VLSI design and used in computer and many other processors. In

rapidly growing mobile industry, faster units are not the only concern but also smaller area

and less power become major concerns for design of digital circuits [1]. Design of low power

and area efficient high speed data path logic systems are most substantial field in the research

of VLSI design. In mobile electronics, reducing area and power consumption are key factors

in increasing portability and battery life. Area and power reduction in data path logic systems

are the main area of research in VLSI system design. High-speed addition and multiplication

has always been a fundamental requirement of high-performance processors and systems area

[2]. Number of fast adders can be used for addition. Addition is the heart of computer

arithmetic, and the arithmetic unit is often the work horse of a computational circuit.

Designing power efficient, high performance adder is one of the major concerns as far as

VLSI Sub system is considered. Speed is usually limited as carry propagation bit of an adder.

They are the necessary component of a data path, e.g. in microprocessors or a signal

processor. The propagated carry reduces the speed of addition. In digital adders, the speed of

addition is limited by the time required to propagate a carry through the adder. The sum for

each bit position in an elementary adder is generated sequentially only after the previous bit

position has been summed and a carry propagated into the next position.

There are many types of adder designs available in the literature such as Ripple Carry Adder

(RCA), Carry Look Ahead Adder (CLA), Carry Select Adder (CSLA), Carry Skip Adder

(CSkA) which have their own advantages and disadvantages. CSLA is one of the fastest

adders having less area and power consumption. The CSLA consists of two multiplexed RCA

and performs operation in parallel with carry Cin = 0 and Cin = 1, then final sum is selected

through multiplexer. In conventional CSLA, XOR, AND and OR gate based Full Adders are

2

used. These adders consumes more area in the chip as large number of transistors are used in

the gates, the delay is higher and consumes more power [3]. In this proposed work, XOR

based modified Full Adder has been used as the building blocks of the modified CSLA to

reduce area, delay and power consumption. The layout of the 16-bit CSLA is designed in

Microwind software and then implemented in Altera DE2-115 FPGA as a hardware design.

The results obtained from the layout and hardware is compared with the conventional RCA

and CSLA.

1.2 Literature Review

Implementation of efficient and high-performance VLSI systems are increasingly used in

portable and mobile devices, multi standard wireless receivers and biomedical

instrumentation [4], [5]. An adder is the main component of an arithmetic unit. A complex

digital signal processing (DSP) system involves several adders. An efficient adder design

essentially improves the performance of a complex DSP system. A RCA uses a simple design

but carry propagation delay (CPD) is the main concern in this adder. Carry look-ahead and

carry select (CS) methods have been suggested to reduce the CPD of adders. A conventional

CSLA is an RCA–RCA configuration that generates a pair of sum words and output carry bits

corresponding the anticipated input-carry (Cin=0 and 1) and selects one out of each pair for

final-sum and final-output-carry [6].

A conventional CSLA has less CPD than an RCA but the design is not attractive since it uses

a dual RCA. Few attempts have been made to avoid dual use of RCA in CSLA design. Kim

and Kim [7] used one RCA and one add-one circuit instead of two RCAs, where the add-one

circuit is implemented using a multiplexer (MUX). He et al. [8] proposed a square-root

(SQRT)-CSLA to implement large bit-width adders with less delay. In a SQRT CSLA,

CSLAs with increasing size are connected in a cascading structure. The main objective of

SQRT-CSLA design is to provide a parallel path for carry propagation that helps to reduce

the overall adder delay. Ramkumar and Kittur [9] suggested a binary to binary to excess-1

converter (BEC)-based CSLA. The BEC-based CSLA involves less logic resources than the

conventional CSLA, but it has marginally higher delay. A CSLA based on common Boolean

logic (CBL) is also proposed in [10] and [11]. The CBL-based CSLA of [10] involves

significantly less logic resource than the conventional CSLA but it has longer CPD which is

almost equal to that of the RCA.

3

To overcome this problem, a SQRT-CSLA based on CBL was proposed in [11]. However,

the CBL-based SQRT CSLA design of [8] requires more logic resource and delay than the

BEC-based SQRT-CSLA of [9]. These adders need a large area in the chip as large number

of transistors are used in the gates, the delay is higher and consumes more power. Therefore,

there is a need for design an efficient CSLA which will be less complex, area and power

consumption will also be less and delay will be minimized.

1.3 Objective of Thesis

a. To design full custom layout of the XOR based 16-bit CSLA.

b. To implement the CSLA in FPGA.

c. To compare area, delay and power consumption with conventional design.

1.4 Organization of Thesis

Introduction, literature review and objective of the thesis are given in Chapter 1. Chapter 2

describes different adder topologies. Detailed description of different adder topologies like

RCA, CLA, CSA, CSkA and CSLA are also presented. Chapter 3 describes on Semi Custom

design of Conventional Full Adder, 2 1 Multiplexer, XOR based Full Adder, basic concept

of Carry Select Adder, XOR based 4-bit, 8-bit and 16-bit CSLA with Schematic diagram,

Verilog code and Layout diagram. Full Custom Layout design of NOT gate, 2-input AND

Gate, 2-input XOR gate, 2 1 MUX, 4-bit XOR based CSLA, 8-bit XOR based CSLA and

16-bit XOR based CSLA with Input-output wave shapes have been described in Chapter 4.

Chapter 5 compares the performance analysis of conventional, semi-custom and full custom

2 1 MUX, Full Adder, CSLA, 4-bit, 8-bit and 16-bit CSLA. Here area, delay, power

dissipation, gate count, IDD(Max) and IDD(Avg) have been compared. Chapter 6 describes the

FPGA implementation of full adder, CSLA, 4-bit, 8-bit and 16-bit CSLA with Simulation

result in Modelsim. The conclusion and future recommendations are given in Chapter 7.

4

Chapter 2

Adder Topologies

The design of various adders such as RCA, CLA, CSA, CSkA and CSLA will be discussed in

this chapter. The each and every adder is named based on the propagation of carry between

the stages. The advantages and unique characteristics in terms of the area, delay and power

consideration will also be focused.

2.1 Ripple Carry Adder

Ripple Carry Adder abbreviated as RCA is considered as basic adder which works on basic

addition principle. It is basically a Cascading formation of full adders (FA) in series, as a full

adder block process three inputs along with carry bit and produce two outputs i.e. Sum bit

and Carry-out bits, the Carry of one full adder block acting as a carry in for the next full

adder. Hence, the carry is propagated in a serial computation [12]. Delay is more as the

number of bits is increased in RCA. A n-bit RCA requires n number of full adders as a full

adder block process three inputs along with carry bit and produce two outputs i.e. Sum bit

and Carry-out Bits, the Carry of one full adder block acting as a carry in for the next full

adder.

For an n-bit ripple carry adder, the block diagram is shown below in Fig. 2.1. RCA is slow

when the word length is large, because the propagation time from C0 to Cn includes all the

carry bits in the worst case. The worst-case delay of the RCA is when a carry signal transition

ripples through all stages of adder chain from the least significant bit to the most significant

bit, which is approximated by:

( ) ……...…………………..… (2.1)

5

Where, is the delay through the carry stage of a full adder and is the delay to compute

the sum of the last stage. The delay of ripple carry adder is linearly proportional to n, the

number of bits, therefore the performance of the RCA is limited when n grows bigger.

The disadvantage of RCA is the long delay due to the propagation of carry from low to high

order stages.

The advantages of the RCA are lower power consumption as well as compact layout giving

smaller chip area and less no of gate count occurs.

Fig. 2.1: Ripple carry adder

2.2 Carry Look-ahead Adder

A Carry Look-ahead Adder abbreviated as CLA is a type of adder used in digital logic. The

disadvantage of the RCA is that it is very slow when many bits are added. CLA solves this

problem by pre-calculation of the carry signals, based on the input signals. It is based on the

fact that a carry signal will be generated in two cases:

Case 1: When both bits Ai and Bi are 1

Case 2: When one of the two bits is 1 and the carry from the previous stage is 1.

Thus, it can be written as

……………………………….. (2.2)

Equation (2.2) can also be written as

……………………………………… (2.3)

Where, and , and are called the Generate and Propagate term,

respectively.

6

Assuming the delay through an AND gate is one gate delay and the delay through an XOR

gate is two gate delays. The Propagate and Generate terms only depend on the input bits and

thus will be valid after two and one gate delay, respectively. If eqn. (2.3) is used to calculate

the carry signals, one does not need to wait for the carry to ripple through all the previous

stages to find its proper value. Hence, the carry for each bit is computed independently. As an

example, for a 4-bit adder the carry bits will be as in eqn. (2.4) to (2.7).

……………………………. (2.4)

……………………...… (2.5)

……………...…… (2.6)

……... (2.7)

The carry-out bit as given in eqn. (2.8), Ci+1, of the last stage will be available after four

delays (two gate delays to calculate the Propagate signal and two delays as a result of the

AND and OR gate). In this way, the carry of an n-bit carry look-ahead adder can be

recursively written as

( )

( ) …....... (2.8)

The Sum signal can be calculated as eqn. (2.9):

…..…………………….….. (2.9)

The Sum bit will thus be available after two additional gate delays (due to the XOR gate) or a

total of six gate delays after the input signals Ai and Bi have been applied. The advantage is

that these delays will be the same independent of the number of bits one needs to add, in

contrast to the ripple adder. A 4-bit carry look-ahead adder is shown in Fig. 2.2.

7

Fig. 2.2: Carry look-ahead adder

2.3 Carry Save Adder

A carry save adder abbreviated as CSA is used to compute the sum of three or more bits in

binary format. It is widely used in the final stages of fast multipliers for summing the partial

products to give out the final value [13]-[14]. A carry save adder is described in Fig. 2.3. An

n-bit adder that does not connect up the carries. It is simply a parallel ensemble of n full-

adders with-out any horizontal connection and the carry is saved as an output and not

propagated to the next higher-order adder. The latency of a carry save adder is the same as

that of a full adder. The propagation delay is independent of the number of bits and the

amount of circuitry is less than a carry select adder.

The advantage of carry save adder is that the sum is computed faster than the conventional

RCA. The carry save adder is better than the conventional carry select adder in terms of area

and power consumption while slower than carry select adder

However, CSA has disadvantages. It does not actually solve the problem of adding two

integers and producing a single output. Instead, it adds three integers and produces two such

that sum of these two is equal to the sum of three inputs.

8

Fig. 2.3: Carry save adder

2.4 Carry Skip Adder

A carry skip adder abbreviated as CSkA consists of a simple ripple carry adder with a special

speed up carry chain called a skip chain. Here for speed up operation, carry propagation is

skipped to position i without waiting for rippling. A carry skip adder reduces the carry

propagation time by skipping over groups of consecutive adder stages. The carry skip adder

is usually comparable in speed to the carry look-ahead technique, but it requires less chip area

and consumes less power. To implement carry skip adder stages are divided into r–bit blocks

of simple carry scheme. Carry skip logic is added to each block to detect when carry-in the

block can be passed directly to the next block. In each block, a ripple carry adder is utilized to

produce the sum and carry out bit for each block. Every block generates a block propagate

and block generate signal. Also, for the given column, CSkA uses the carry out equation in

terms of the carry in signal.

……………………………… (2.10)

Also Generate and Propagate signals used by the carry skip adder are:

……………………………………. (2.11)

9

………………………………… (2.12)

From this equation, it can be seen that setting the carry-in signal of a block to zero causes the

carry out to serve as a block generate signal. Therefore, an r-bit AND gate is also used to

form the block propagate signal. The block generate and block propagate signals produce the

input carry to the next block [15]. Fig. 2.4 shows the 8-bit Carry Skip Adder using 2-bit

blocks of RCA

Fig. 2.4: Carry skip adder

2.5 Carry Select Adder

A carry select adder abbreviated as CSLA is divided into sectors, each of which (except for

the least significant) performs two additions in parallel, one assuming a carry-in of zero, the

other a carry-in of one. A four bit CSLA generally consists of two ripple carry adders and a

multiplexer. The carry select adder is simple but rather fast, having a gate level depth of

(√ ). Adding two n-bit numbers with a CSLA is done with two adders (two ripple carry

adders) in order to perform the calculation twice, one time with the assumption of the carry

being zero and the other assuming one. After the two results are calculated, the correct sum,

as well as the correct carry is then selected with the multiplexer once the correct carry is

known. The design schematic of CSLA is shown in Fig. 2.5. Here, two 4-bit ripple carry

adders are multiplexed together, where the resulting carry and sum bits are selected by the

carry-in. Since one ripple carry adder assumes a carry-in of 0 and the other assumes a carry-

in of 1, selecting which adder had the correct assumption via the actual carry-in yields the

desired result. A carry select adder speeds 40% to 90% faster than RCA by performing

additions in parallel and reducing the maximum carry path [3].

10

Fig. 2.5: Carry select adder

The CSLA are divided into two types: uniform sized adders and variable sized adders. If the

bit length is equally divided it is called uniform sized adders. It is also called Linear CSLA.

In variable sized adders the bit length is unequally divided.

2.5.1 Uniform sized CSLA

A 16-bit carry select adder with a uniform block size of 4 can be created with three of these

blocks and a 4-bit ripple carry adder. Since carry-in is known at the beginning of

computation, a carry select block is not needed for the first four bits. The delay of this adder

will be four full adder delays, plus three MUX delays. Uniform sized 16-bit carry select adder

is shown in Fig. 2.6

Fig. 2.6: Uniform sized 16-bit carry select adder

11

2.5.2 Variable sized CSLA

A 16-bit carry-select adder with variable size can be similarly created. Here we show an

adder with block sizes of 2-2-3-4-5. This break-up is ideal when the full-adder delay is equal

to the MUX delay, which is unlikely. The total delay is two full adder delays, and four mux

delays. We try to make the delay through the two carry chains and the delay of the previous

stage carry equal. Variable sized 16-bit carry select adder is shown in Fig. 2.7.

Fig. 2.7: Variable sized 16-bit carry select adder

12

Chapter 3

Semi Custom Design

Design of logic networks with the highest performance requires deliberate design of logic

networks, design of transistor circuits, layout of these transistor circuits most compactly and

manufacturing of them. Such logic networks are realized by full-custom design. In contrast to

full custom design, semi-custom design simplifies design and layout of transistor circuits to

save expenses and design time. Depending on how design and layout of transistor circuits are

simplified (e.g., repetition of small transistor sub circuit or not so compact layout) and even

how logic design is simplified. In semi custom design the designer has little control over the

specification and functionality of the specific function but the required time is less. It uses

pre-designed logic cell (AND gates, OR gate, multiplexers) known as standard cells and the

designer use pre-tested or pre-characterized cell. In this chapter semi custom design of the

16-bit CSLA will be discussed. Here Verilog codes will be generated using schematic circuits

designed in DSCH software. Using this Verilog code, layout will be constructed in

Microwind software.

3.1 Conventional 2×1 Multiplexer

Multiplexing is the generic term used to describe the operation of sending one or more

analogue or digital signals over a common transmission line at different times or speeds.

The multiplexer, shortened to “MUX” is a combinational logic circuit designed to switch one

of several input lines through to a single common output line by the application of a control

signal.

The Boolean expression of a 21 MUX is

……………….…………………. (3.1)

13

The schematic circuit, extracted layout from the Verilog code and the input-output wave

shapes of the conventional MUX are given in Fig. 3.1, Fig. 3.2 and Fig. 3.3 respectively.

Fig. 3.1 : Schmatic circuit of 2×1 Multiplexer

Fig 3.2 : Layout of Conventional 2×1 Multiplexer

14

Fig 3.3 : Input-Output wave shapes of conventional 2×1 MUX

The schematic of Fig. 3.1 is converted to Verilog code using DSCH 3.1 software. Then the

Verilog code is compiled in Microwind 3.1 software to generate the layout in 90 nm CMOS

process. By this the schematic of the logic design is converted into physical layout. Using this

physical layout, the parameters like Area, Delay, Power Dissipation, AT, A , resistance,

capacitance, node voltage and current can be estimated. Different parameters of the

conventional 2×1 MUX are given in Table 3.1.

Table 3.1: Area, Delay and Power Dissipation of the Conventional 2×1 MUX

Parameter Value Parameter Value

Area 47.7 IDD (Max) 0.690 mA

Delay 15 ps No. of NMOS 10

Power Dissipation 0.793 µW No. of PMOS 10

15

3.2 Pass Transistor Based 2×1 Multiplexer

The 2×1 MUX of Fig. 3.1 can be constructed using pass transistor concept. In that case the

number of transistors used will be minimized. Here only 6 transistors are needed in place of

20 used in conventional logic. The schematic circuit, constructed layout and the input-output

wave shape of the pass transistor based 2×1 MUX is given in Fig. 3.4, Fig. 3.5 and Fig. 3.6

respectively.

Fig. 3.4: Pass transistor based 2×1 MUX

Fig. 3.5: Layout of pass transistor based 2×1 MUX

16

Fig. 3.6: Input-Output wave shapes of pass transistor based 2×1 MUX

Different parameters of the pass transistor based 2×1 MUX are given in Table 3.2. Compared

to the conventional 2×1 MUX , it is found that the pass transistor based 2×1 MUX consumes

more space, and the delay is little bit high however the number of transistor and the power

dissipation is drastically reduced.

Table 3.2: Area, Delay and Power Dissipation of the pass transistor based 2×1 MUX


Area 88.7 IDD (Max) 1.189 mA

Delay 22 ps No. of NMOS 03

Power Dissipation 0.311 µW No. of PMOS 03

17

3.3 Conventional Full Adder

The conventional 1-bit Full Adder consists of two XOR gates, two AND gates and an OR

gate as shown in Fig. 3.7. The constructed layout in Microwind and the input-output wave

shapes are shown in Fig. 3.8 and Fig. 3.9 respectively.

Fig. 3.7: Conventional 1-bit full adder

Fig. 3.8: Layout of the conventional 1-bit full adder

18

Fig. 3.9: Input-Output wave shapes of conventional 1-bit full adder

Different parameters of the conventional 1-bit Full Adder are given in Table 3.3.

Table 3.3: Area, Delay and Power Dissipation of conventional 1-bit full adder


Area 72.5 No. of NMOS 15

Delay 66 ps No. of PMOS 15

Power Dissipation 39.543 W IDD (Max) 0.781 mA

No. of Transistor 30 IDD (Avg) 0.040 mA

19

3.4 XOR Based Full Adder

A 1-bit full adder realization employing two XOR gates and one 2×1 MUX is shown in

Fig. 3.10. The main difference between the conventional and XOR based adders is that in

XOR based adder other than two XOR gates only one 2×1 MUX is used which needs only 6

MOSFETs where as in conventional type adder other than two XOR gates two AND gates

and one OR gate is needed which requires at least 18 number of MOSFETs. So, there is at

least 12 numbers of MOSFET savings in XOR based 1-bit adders than conventional which in

turn has less area and power consumption at the same time there is also a better delay

performance. Layout and the input-output wave shapes of the XOR based 1-bit full adder is

shown in Fig. 3.11 and Fig. 3.12 respectively.

Fig. 3.10: XOR based 1-bit full adder

Fig. 3.11: Layout of XOR based 1-bit full adder

20

Fig. 3.12: Input-Output wave shapes of XOR based 1-bit full adder

Different parameters of the XOR based 1-bit Full Adder are given in Table 3.4.

Table 3.4: Area, Delay and Power Dissipation of XOR based 1-bit full adder




Power Dissipation 56.959 µW IDD (Max) 0.732mA

No. of Transistor 18 IDD (Avg) 0.050mA

21

3.5 Carry Select Adder

The CSLA is constructed from two RCAs and a multiplexer. Addition of two n-bit

numbers with CSLA is nothing but adding two numbers taking input carry first as zero then

using another adder taking input carry as one. After calculation of the two results depending

on the correct carry-in the correct sum as well as the correct carry-out is selected with the

multiplexer connected at last to get the final output.

Fig. 3.13: XOR based 4-bit CSLA

3.5.1 XOR Based 4-bit CSLA

The 4-bit CSLA generally consists of two 4-bit RCA. In one RCA the Cin bit is taken as zero

and for other the Cin bit is taken as one. When the addition is completed the correct output as

well as Cout is taken out with the MUX from one of the RCAs depending on the actual Cin.

The schematic circuit, semi custom layout and the input-output wave shapes of the 4-bit

CSLA are shown in Fig. 3.13, Fig. 3.14 and Fig. 3.15 respectively and the different

parameters like Area, Delay and Power dissipation of the XOR based 4-bit CSLA are given

in Table 3.5.

22

Fig. 3.14: Layout of the XOR based 4-bit CSLA

Fig. 3.15: Input-Output wave shapes of XOR based 4-bit CSLA

23

Table 3.5: Area, Delay and Power Dissipation of XOR based 4-bit CSLA




Power Dissipation 0.291 mW IDD (Max) 3.090mA

No. of Transistor 174 IDD (Avg) 0.240mA


The 8-bit CSLA generally consists of three 4-bit RCA and five 2×1 MUX. As shown in Fig.

3.16 the 8-bit CSLA is divided in two groups. In first group 4-bit RCA and in second group

4-bit CSLA is used. The semi custom layout and the input-output wave shapes of the XOR

based 8-bit CSLA are shown in Fig. 3.17 and Fig. 3.18 respectively and the different

parameters like Area, Delay and Power dissipation of the 8-bit CSLA are given in Table 3.6.


24

Fig. 3.17: Layout of XOR based 8-bit CSLA


25



Area 2984 No. of NMOS 123


Power Dissipation 0.438 mW IDD (Max) 2.968 mA



The 16-bit CSLA consists of seven 4-bit RCA and fifteen 2×1 MUX. As shown in Fig. 3.19

the 16-bit CSLA has one 4-bit RCA and three 4-bit CSLA. The semi custom layout and the

input-output wave shapes of the XOR based 16-bit CSLA are shown in Fig. 3.20 and Fig.

3.21 respectively and the different parameters like Area, Delay and Power dissipation of the

16-bit CSLA are given in Table 3.7.


26

Fig. 3.20: Layout of XOR based 16-bit CSLA


27





Power Dissipation 1.029 mW IDD (Max) 7.044 mA


28

Full-Custom

ASICs

Semi-Custom

ASICs

Standard- Cell Based

ASICs

Gate-Array Based

ASICs

Programmable ASICs

PLDs FPGA

ASICs

Chapter 4

Full Custom Design

Full-custom design is a methodology for designing integrated circuits by specifying the

layout of each individual transistor and the interconnections between them. Full-custom

design is limited to ICs that are to be fabricated in extremely high volumes, notably certain

microprocessors and a small number of ASICs. The VLSI design flow is given in Fig. 4.1.

Fig. 4.1: VLSI design flow

In full custom design, the entire mask design is done as new without use of any library. Full-

custom design is logic design to attain the highest smallest size, utilizing the most advanced

technology. Designers usually try to improve the economic aspect, that is, performance per

cost, at the same time. Every design stage is carefully done for the maximum performance

and transistor circuits are deliberately laid out on chips most compactly, spending months by

many draft people and engineers. In this chapter, the full custom design of 16-bit CSLA is

presented starting from designing an inverter.

29

4.1 Full Custom Design of Inverter or NOT Gate

A NOT gate, often called an inverter, is a digital logic gate to start with because it has only a

single input with simple behavior. A NOT gate performs logical negation on its input. In

other words, if the input is true, then the output will be false. The NOT gate is designed and

simulated using Microwind 3.1 software. The designed layout and simulation results of the

NOT gate are shown in Fig 4.2 and Fig. 4.3 respectively. The technology library used in this

work is CMOS 6-metal 90nm technology. The simulation result of different parameters of

NOT gate is shown in Table 4.1.

Fig. 4.2: Full custom layout of NOT gate

Fig. 4.3: Input-Output wave shapes of NOT gate

30

Table 4.1: Area, Delay and Power Dissipation of NOT gate



Delay 1ps No. of PMOS 01



4.2 Full Custom Design of 2-input AND Gate

The 2-input AND gate is logically represented as shown in Fig. 4.4 with two inputs and one

output.

Fig. 4.4 : 2-input AND gate symbol

Fig. 4.5: Full custom layout of 2-input AND gate

31

The 2-input AND gate is designed and simulated using Microwind 3.1 software. The

designed layout and simulation results of the 2-input AND gate are shown in Fig 4.5 and

Fig. 4.6 respectively. The simulation result of different parameters of 2-input AND gate are

given in Table 4.2.

Fig. 4.6: Input-Output wave shapes of 2-input AND gate

Table 4.2: Area, Delay and Power Dissipation of 2-input AND gate






32

4.3 Full Custom Design of 2-input XOR Gate

An XOR gate or exclusive OR gate is a digital logic gate with two or more inputs and one

output that performs exclusive disjunction. The output of an XOR gate is true only when

exactly one of its inputs is true.

The Boolean expression of XOR gate is ( ) . The pass transistor

based schematic circuit, full custom layout and input-output wave shapes of the 2-input XOR

gate are shown in Fig. 4.7, Fig. 4.8 and Fig. 4.9 respectively.

Fig. 4.7: Schematic design of 2-input XOR gate

Fig. 4.8: Full custom layout of 2-input XOR gate

33

Fig. 4.9: Input-Output wave shapes of 2-input XOR gate

Table 4.3 shows the different simulated parameters of 2-input XOR gate.

Table 4.3: Area, Delay and Power Dissipation of 2-input XOR gate






34

4.4 Full Custom Design of 2×1 MUX

A two in one multiplexer (or 2×1 MUX) is a common digital circuit used to mix a lot of

signals into just one. The pass transistor based schematic circuit of a 2×1 MUX is shown in

Fig. 4.10. Here, A and B are the two inputs, S is the select signal and Z is the output.

Fig. 4.10: Schematic circuit of 2×1 MUX

Fig. 4.11: Full custom layout of 2×1 MUX

35

The full custom layout and input-output wave shapes of the 2×1 MUX are shown in Fig. 4.11

and Fig. 4.12. Table 4.4 shows the different simulated parameters of 2×1 MUX.

Fig 4.12: Input-Output wave shapes of the 2×1 MUX

Table 4.4: Area, Delay and Power Dissipation of the 2×1 MUX






36

4.5 Full Custom Design of XOR Based Full Adder

The schematic circuit of a 1-bit full adder realization employing two XOR gates and one 2×1

MUX is shown in Fig. 4.13. Full custom layout and the input-output wave shapes of the XOR

based full adder are shown in Fig. 4.14 and Fig. 4.15 respectively.

Fig. 4.13: XOR based 1-bit full adder

Fig 4.14: Full Custom Layout of a XOR Based 1-bit Full Adder

Fig 4.15: Input-Output wave shapes of XOR based 1-bit full adder

37

The simulation result of different parameters of XOR based 1-bit full adder are given in

Table 4.5.

Table 4.5: Area, Delay and Power Dissipation of the 1-bit full adder






4.6 Full Custom Design of XOR Based 4-bit CSLA

The full custom layout of the 4-bit CSLA is shown in Fig. 4.16.

Fig. 4.16: Full custom layout of XOR based 4-bit CSLA

38

The input-output wave shapes of the 4-bit CSLA is shown in Fig. 4.17 and the simulated

parameters are given in Table 4.6 respectively.


Table 4.6: Area, Delay and Power Dissipation of the XOR based 4-bit CSLA




Power Dissipation 196 W IDD (Max) 1.298 mA


39


The 8-bit CSLA generally consists of three 4-bit RCA. The full custom layout of the 8-bit

CSLA is shown in Fig. 4.18. The input-output wave shapes of the 8-bit CSLA is shown in

Fig. 4.19 and the simulated parameters are given in Table 4.7 respectively.



40








The full custom XOR based 16-bit CSLA is construct with thirty XOR based full adder and

eighteen 21 MUX. The layout and the input-output wave shapes of the XOR based 16-bit

CSLA is shown in Fig. 4.20 and Fig. 4.21 respectively. The different parameters after

simulation is given in Table 4.8.


41








42

Chapter 5

Performance Analysis

In this chapter the performance of the designed CSLA will be analyzed which are implemented

in Microwind 3.1 using 90 nm CMOS process. Power dissipation (the power which is consumed

by a device during the execution of its logical operation or the power which is dissipated as heat

by the device), delay and power delay product (PDP) are measured for different designs. All the

simulation is done for the supply voltage of = 1.2V and a clock frequency of 10 MHz to

500 MHz.

5.1 Comparison of Different 2 ×1 MUX

Table 5.1 shows the performance analysis of conventional, semi custom and full custom

2 ×1 MUX. Graphical representation of area, delay and power are given in Fig. 5.1.

Table 5.1: Performance Analysis of Conventional, Semi and Full Custom 2×1 MUX

Conv. Semi

Custom

Full

Custom

% reduction of full

custom compared

to conventional

% reduction of full

custom compared to

semi Custom

Area ( ) 47.7 88.7 7.1 85.12% 92.00%

Power (µW) 0.793 0.311 0.047 94.07% 84.89%

Delay (ps) 15 22 22 -46.67% 0.00%

Gate

Count

NMOS 10 03 03 70.00% 0.00%

PMOS 10 03 03 70.00% 0.00%

Total

Gate

Count

20 06 06 70.00% 0.00%

(mA) 0.690 1.189 0.125 81.88% 89.49%

43

Fig 5.1: Area, Power and Delay comparison of 2×1 MUX

0

20

40

60

80

100

Conventional Semi Custom Full Custom

µm

2

Types of Design

Area

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


µW

Types of Design

Power

0

5

10

15

20

25


ps

Types of Design

Delay

44

5.2 Comparison of Different Full Adder

Performance analysis of conventional, semi custom and full custom 1-bit full adder is given in

Table 5.2 and the graphical representation of area, delay and power are given in Fig. 5.2.

Table 5.2: Performance analysis of Conventional, Semi and Full Custom 1-bit Full Adder

Conv. Semi

Custom

Full

Custom

% reduction of full

custom compared

to conventional

% reduction of full

custom compared

to semi Custom

Area ( ) 72.5 132.0 28.7 60.4% 78.3%

Power (µW) 39.543 56.959 28.328 28.4% 50.3%

Delay (ps) 66 61 30 54.5% 50.8%

Gate

Count

NMOS 15 09 09 40.0% 0 %

PMOS 15 09 09 40.0% 0 %

Total Gate

Count 30 18 18 40.0% 0 %

0.781 0.732 0.223 71.5% 69.6%

IDD (Avg) (mA) 0.040 0.050 0.029 27.5% 42.0%

(a)

0

20

40

60

80

100

120

140


µm

2

Types of Design

Area

45

(b)

(c)

Fig. 5.2: Area, Power and Delay comparison of 1-bit Full Adder

0

10

20

30

40

50

60


µW

Types of Design

Power

0

10

20

30

40

50

60

70


ps

Types of Design

Delay

46

5.3 Comparison of 4-bit CSLA

Performance analysis of conventional, semi custom and full custom 4-bit CSLA is given in


Table 5.3: Performance analysis of Conventional, Semi and Full Custom 4-bit CSLA

Conv. Semi

Custom

Full

Custom

% reduction of full

custom compared

to conventional

% reduction of full

custom compared

to semi Custom

Area ( ) 2490 2177.3 345.8 86.5% 84.2%

Power (mW) 0.299 0.291 0.196 34.5% 32.7%

Delay (ps) 96 91 14 85.5% 84.6%

Gate

Count

NMOS 170 87 84 50.6% 3.45%

PMOS 170 87 84 50.6% 3.45%

Total Gate

Count 340 174 168 50.6% 3.45%

2.941 3.090 1.298 55.8% 58.0%

IDD (Avg) (mA) 0.249 0.240 0.202 18.8% 15.8%

(a)

0

500

1000

1500

2000

2500

3000


µm

2

Types of Design

Area

47

(b)

(c)

Fig. 5.3: Area, Power and Delay comparison of 4-bit CSLA

For XOR based 4-bit CSLA the full custom design has 86.5% area, 34.5% power and 85.5%

delay reduction over conventional design where as it has 84.2% area, 32.7% power and 84.6%

delay reduction over semi custom design.

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35


mW

Types of Design

Power

0

20

40

60

80

100

120


ps

Types of Design

Delay

48





Conv. Semi

Custom

Full

Custom

% reduction of full

custom compared

to conventional

% reduction of full

custom compared

to semi custom

Area ( ) 3013.5 2984 479.5 84.1% 84.0%

Power (mW) 0.936 0.438 0.289 69.2% 34.02%

Delay (ps) 138 136 46 66.7% 66.2%

Gate

Count

NMOS 195 123 117 40.00% 4.88%

PMOS 195 123 117 40.00% 4.88%

Total Gate

Count 390 246 234 40.00% 4.88%

2.231 2.968 1.058 52.6% 64.4%

IDD (Avg) (mA) 0.550 0.360 0.304 44.7% 15.6%

(a)

0

500

1000

1500

2000

2500

3000

3500


µm

2

Types of Design

Area

49

(b)

(c)





0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00


mW

Types of Design

Power

0

20

40

60

80

100

120

140

160


ps

Types of Design

Delay

50





Conv. Semi

Custom

Full

Custom

% reduction of full

custom compared

to conventional

% reduction of full

custom compared

to semi custom

Area ( ) 8046.8 9732.4 1263.5 84.3% 87.0%

Power (mW) 2.803 1.029 0.823 70.6% 20.1%

Delay (ps) 140 136 61 56.5% 55.2%

Gate

Count

NMOS 465 297 285 38.71% 4.04%

PMOS 465 297 285 38.71% 4.04%

Total Gate

Count 930 594 570 38.71% 4.04%

6.626 7.044 3.88 41.5% 45.0%

IDD (Avg) (mA) 2.336 0.994 0.966 58.7% 2.8%

(a)

0

2000

4000

6000

8000

10000

12000


µm

2

Types of Design

Area

51

(b)

(c)





0.00

0.50

1.00

1.50

2.00

2.50

3.00


mW

Types of Design

Power

0

20

40

60

80

100

120

140

160


ps

Types of Design

Delay

52

5.6 Comparison with Other Work

Performance analysis of semi custom and full custom 8-bit and 16-bit CSLA with Other Work is

given in Table 5.6 .

Table 5.6: Performance analysis of Comparison with Other Work

Bit

size Parameters Ref. [18] Ref. [19] Ref. [3]

This work

Semi Custom Full Custom

8-Bit

Power(mW) 13.598 0.659 1.109 0.438 0.289

Delay(ns) 2.094 2.79 2.75 0.136 0.046

Area (μm2) 952.343 1035 6201 2984 479.5

16-Bit

Power(mW) 29.311 1.316 - 1.029 0.823

Delay(ns) 2.450 3.79 - 0.136 0.061

Area (μm2) 1901.09 2325 - 9732.4 1263.5

Technology 180 nm 180 nm 120 nm 90nm

53

Chapter 6

FPGA Implementation of CSLA

6.1 Introduction to FPGA

A field-programmable gate array (FPGA) is an integrated circuit designed to be configured

by a customer or a designer after manufacturing – hence the term “field-programmable”. The

FPGA configuration is generally specified using a hardware description language (HDL),

similar to that used for an application-specific integrated circuit (ASIC). Circuit

diagrams were previously used to specify the configuration, but this is increasingly rare due

to the advent of electronic design automation tools.

Fig. 6.1: FPGA Architecture

FPGAs contain an array of programmable logic blocks and a hierarchy of reconfigurable

interconnects that allow the blocks to be “wired together”, like many logic gates that can be

inter-wired in different configurations as shown in Fig. 6.1. Logic blocks can be configured

to perform complex combinational functions or merely simple logic gates like AND and

XOR. In most FPGAs, logic blocks also include memory elements, which may be

https://en.wikipedia.org/wiki/Integrated_circuit

https://en.wikipedia.org/wiki/Field-programmability

https://en.wikipedia.org/wiki/Hardware_description_language

https://en.wikipedia.org/wiki/Application-specific_integrated_circuit

https://en.wikipedia.org/wiki/Circuit_diagram

https://en.wikipedia.org/wiki/Circuit_diagram

https://en.wikipedia.org/wiki/Electronic_design_automation

https://en.wikipedia.org/wiki/Programmable_logic_device

https://en.wikipedia.org/wiki/Logic_block

https://en.wikipedia.org/wiki/Logic_block

https://en.wikipedia.org/wiki/Combinational_logic

https://en.wikipedia.org/wiki/Logic_gate

https://en.wikipedia.org/wiki/AND_gate

https://en.wikipedia.org/wiki/XOR_gate

https://en.wikipedia.org/wiki/Memory_cell_(computing)

54

simple flip-flops or more complete blocks of memory. Many FPGAs can be reprogrammed to

implement different logic functions allowing flexible reconfigurable computing as performed

in computer software. Here, we have used Altera DE2-115 FPGA board to implement the

CSLA in hardware.

6.2 Altera DE2 -115 FPGA Board

Altera DE2-115 board become one of the most widely development FPGA board which is

used for development of FPGA design and implementations [16]. The purpose of the Altera

DE2-115 Development and Education board is to provide the ideal vehicle for learning about

digital logic, computer organization and FPGAs. It uses the state-of the-art technology in

hardware tools to expose students and professionals to a wide range of topics [17]. The board

offers a rich set of features that make it suitable for use in a laboratory environment for

university and college courses, for a variety of design projects, as well as for the development

of sophisticated digital systems. The Altera DE2-115 FPGA Board is shown in Fig. 6.2.

Fig 6.2: Altera DE2 -115 FPGA Board

https://en.wikipedia.org/wiki/Flip-flop_(electronics)

https://en.wikipedia.org/wiki/Boolean_function

https://en.wikipedia.org/wiki/Reconfigurable_computing

https://en.wikipedia.org/wiki/Software

55

Fig. 6.3: Block diagram of the Altera DE2 -115 FPGA Board

6.3 FPGA Implementation of Conventional 1-bit Full Adder

To implement the conventional 1-bit full adder a Verilog code has been written as shown in

Fig. 6.4.

module fulladder (a, b, cin, sum, cout);

input a, b, cin;

output sum, cout;

wire [2:0]w;

xor xor1(w[0], a, b);

xor xor2(sum, w[0], cin);

and and1(w[1], a, b);

and and2(w[2], cin, w[0]);

or or1(cout, w[1], w[2]);

endmodule

Fig. 6.4: Verilog code of 1-bit full adder

56

Then this code has been simulated in Modelsim software. The simulation result is shown in

Fig. 6.5. When the simulation results are OK then the synthesis and power analysis is done in

Quartus II software. The synthesis and power analysis results are shown in Fig. 6.6 and Fig.

6.7 respectively. The implementation of the 1-bit full adder in FPGA is shown in Fig. 6.8.

Fig. 6.5: Simulation waveform results of conventional 1-bit full adder

Fig. 6.6: Synthesis summary of the conventional 1-bit full adder

57

Fig. 6.7: Power analysis of the conventional 1-bit full adder

Fig. 6.8: Implementation of the conventional 1-bit full adder in FPGA

58

6.4 FPGA Implementation of 4-bit CSLA

The verilog code for the 4-bit CSLA is given in Fig. 6.9. This code has been simulated in

Modelsim software. The simulation result is shown in Fig. 6.10 for binary and in Fig. 6.11 for

decimal. The synthesis and power analysis done in Quartus II software are shown in Fig. 6.12

and Fig. 6.13 respectively. The implementation of the 4-bit CSLA in FPGA is shown in

Fig. 6.14.

module fourbitcsla (a, b, cin, sum, cout);

input [3:0] a, b;

input cin;

output [3:0]sum;

output cout;

wire [2:0]w;

fulladder fulladd1(a[0], b[0], cin, sum[0], w[0]);

fulladder fulladd2(a[1], b[1], w[0], sum[1], w[1]);


fulladder fulladd4(a[3], b[3], w[2], sum[3], cout);

endmodule


input a, b, cin;

output sum, cout;

wire [2:0]w;






endmodule

Fig. 6.9: Verilog code of 4-bit CSLA

59

Fig. 6.10: Simulation waveform results of 4-bit CSLA (binary)

Fig. 6.11: Simulation waveform results of 4-bit CSLA (decimal)

60

Fig. 6.12: Synthesis summary of 4-bit CSLA

Fig. 6.13: Power analysis of 4-bit CSLA

61

Fig. 6.14: FPGA implementation of 4-bit CSLA

B3 B2 B1 B0 A3 A2 A1 A0

1 1 1 1 1 0 1 0

CoutS3S2S1S0

1 1 0 0 1

62


The verilog code for the 8-bit CSLA is given in Fig. 6.15. This verilog code then simulated

in Modelsim software and the simulation result is shown in Fig. 6.16 for binary and in Fig.

6.17 for decimal. The synthesis and power analysis are done in Quartus II software and are

shown in Fig. 6.18 and Fig. 6.19 respectively. The implementation of the 8-bit CSLA in

FPGA is shown in Fig. 6.20.

module eightbitcsla(a, b, cin, c0, c1, sum, cout);

input [7:0]a, b;

input cin, c0, c1;

output [7:0]sum;

output cout;

wire [19:0]w;





fulladder fulladd5(a[4], b[4], c0, w[4], w[5]);

fulladder fulladd6(a[5], b[5], w[5], w[6], w[7]);





fulladder fulladd11(a[6],b[6],w[15],w[16],w[17]);


mux mux1(sum[4], w[3], w[4], w[12]);

mux mux2(sum[5], w[3], w[6], w[14]);

mux mux3(sum[6], w[3], w[8], w[16]);

mux mux4(sum[7], w[3], w[10], w[18]);

mux mux5(cout, w[3], w[11], w[19]);

endmodule


63

module fulladder(a, b, cin, sum, cout1);

input a, b, cin;

output sum, cout1;

wire [2:0]w;





or or1(cout1, w[1], w[2]);

endmodule

module mux(out, S, B, A);

input B,A,S;

wire [2:0]w;

output out;

and and3(w[1], w[0], A);

and and4(w[2], B, S);

or or2(out, w[1], w[2]);

not inv1(w[0], S);

endmodule

Fig. 6.15: Verilog code of 8-bit CSLA (contd…..)

64

Fig. 6.16: Simulation waveform results of 8-bit CSLA (Binary)

Fig. 6.17: Simulation waveform results of 8-bit CSLA (Decimal)

65



66

Fig. 6.20: FPGA implementation result of 8-bit CSLA.

B7 B6 B5 B4 B3 B2 B1 B0

0 0 0 0 0 0 0 0

A7 A6 A5 A4 A3 A2 A1 A0

1 1 1 1 1 1 1 1

Cout S7 S6 S5 S4 S3 S2 S1 S0

0 1 1 1 1 1 1 1 1

67


The verilog code for the 16-bit CSLA is given in Fig. 6.21. This verilog code then simulated

in Modelsim software and the simulation result is shown in Fig. 6.22 for binary and in Fig.

6.23 for decimal. The synthesis and power analysis are done in Quartus II software and are

shown in Fig. 6.24 and Fig. 6.25 respectively. Due to the pin shortage in the FPGA board

the 16-bit CSLA can not be implemented in FPGA.

module sixteenbitcsla(a, b, cin, c0, c1, sum, cout);

input cin, [15:0]a, [15:0]b, [3:1]c0, c1;

output cout, [15:0]sum;

wire [3:1]w;

fourbitcsla fourbitcsla11(a[3:0], b[3:0], cin, sum[3:0], w[1]);

fourbitcslatwo fourbitcslatwo1(a[7:4], b[7:4], c0[1], c1[1], w[1], sum[7:4], w[2]);

fourbitcslatwo fourbitcslatwo2(a[11:8], b[11:8], c0[2], c1[2], w[2], sum[11:8], w[3]);

fourbitcslatwo fourbitcslatwo3(a[15:12], b[15:12], c0[3], c1[3], w[3], sum[15:12], cout);

endmodule

module fourbitcslatwo(a, b, cin, c0, c1, sum, cout);

input [7:4]a,b;

input cin,c0,c1;

output [7:4]sum;

output cout;

wire [19:0]w;





fulladder fulladd5(a[4],b[4],c1,w[12],w[13]);




mux mux1(sum[4],cin,w[4],w[12]);




mux mux5(cout,cin,w[11],w[19]);

endmodule


68

module fourbitcsla (a, b, cin, sum, cout);

input [3:0] a, b;

input cin;

output [3:0]sum;

output cout;

wire [2:0]w;




fulladder fulladd4(a[3], b[3], w[2], sum[3], cout);

endmodule


input a, b, cin;

output sum, cout;

wire [2:0]w;






endmodule

module mux(out, S, B, A);

input B, A, S;

wire [2:0]w;

output out;

and and3(w[1], w[0], A);

and and4(w[2], B, S);

or or2(out, w[1], w[2]);

not inv1(w[0], S);

endmodule

Fig. 6.21: Verilog code of 16-bit CSLA (contd……)

69

Fig. 6.22: Simulation waveform results of 16-bit CSLA (Binary)

Fig. 6.23: Simulation waveform results of 16-bit CSLA (Decimal)

70



71

Summary of the all results obtained from the synthesis and power analysis using Quartus II

software is given in Table 6.1.

Table 6.1: Summary of the all results obtained from the synthesis and power analysis using

Quartus II software

Full Adder

CSLA

4-bit 8-bit 16-bit

Total pins 5 14 28 56

Total Logic Elements 2 8 24 56

Total Thermal Power Dissipation (mW) 115.88 116.72 118.04 120.67

Core Static Thermal Power Dissipation (mW) 99.09 99.09 99.10 99.10

I/O Thermal Power Dissipation (mW) 16.79 17.63 18.94 21.57

72

Chapter 7

Conclusion and Future Recommendation

7.1 Conclusion

A 16-bit CSA is implemented here using XOR-based 1-bit full adder as a building block.

The schematic has been designed in DSCH software and synthesized using 90 nm CMOS

technology. The layout has been created and simulated in Microwind software. The

comparison has been performed with area, delay and power dissipation. The Performance

analysis, simulation result and comparison are reported. From the simulation results of 2×1

MUX, 94.07% reduction in power consumption has been achieved using full custom design

over conventional design and 84.89% over semi custom design. For full custom design is the

area is 85.12% less than the conventional and 92% less than the semi custom design.

For full custom XOR based CSLA design the area, power, delay and total no of MOSFET are

86.5%, 34.5%, 85.5% and 50.6% less than the conventional CSLA for 4-bit, 84.1%, 69.2%,

66.7% and 40% less for 8-bit, 84.3%, 70.6%, 55.5% and 50.59% less for 16-bit respectively.

The full custom CSLA has 84.2%, 32.7%, 84.6% and 3.45% area, power, delay and no. of

MOSFET reduction over the semi custom design for 4-bit, 84%, 34.02%, 66.2% and 4.88%

for 8-bit, 8%, 20.1%, 55% and 3.45% for 16-bit has been achieved.

We have also implemented the CSLA in Altera DE2-115 FPGA board and performed the

synthesis and power analysis. For this at first Verilog code has been simulated in Modelsim

software, then the simulation results are checked, when the simulation results are OK then the

synthesis and power analysis is done in Quartus II software and implementation has been

done in Altera DE2-115 FPGA board. By giving some arbitrary inputs we have checked that

the implemented hardware was performing correctly.

7.2 Future Recommendation

In future work, it is needed to design unique CSLA which provides low area as well as delay

in order to meet the needs of current VLSI industry. Further, this work can be extended by

designing and simulating the adders with increased number of bits such as 32-bit, 64-bit and

128-bit.

73

References:

[1] K. Tejasvi1 and G. S. Kishore, “Low-Power and Area-Efficient N-Bit Carry-Select Adder,”

International Advanced Research Journal in Science, Engineering and Technology, Vol. 3,

Issue 7, pp. 186-189, July 2016.

[2] P. Balasubramanian, N. E. Mastorakis, “High Speed Gate Level Synchronous Full Adder

Designs,” WSEAS Transactions on Circuits and Systems, Volume 8, Issue 2, pp. 290-300,

February 2009.

[3] R. Uma, V. Vijayan, M. Mohanapriya and S. Paul, “Area, Delay and Power Comparison of

Adder Topologies,” International Journal of VLSI design & Communication Systems

(VLSICS), Vol. 3, No. 1, pp. 153-168, February 2012.

[4] K. K. Parhi, VLSI Digital Signal Processing. New York, NY, USA: Wiley, 1998.

[5] A. P. Chandrakasan, N. Verma, and D. C. Daly, “Ultralow-power electronics for biomedical

applications,” Annual Review of Biomedical Engineering, Vol. 10, pp. 247–274, August

2008.

[6] O. J. Bedrij, “Carry-select adder,” IRE Transactions on Electronic Computers, Vol. EC-11,

No. 3, pp. 340–344, June 1962.

[7] Y. Kim and L. S. Kim, “64-bit carry-select adder with reduced area,” Electron. Lett., Vol. 37,

No. 10, pp. 614–615, May 2001.

[8] Y. He, C. H. Chang, and J. Gu, “An area-efficient 64-bit square root carry select adder for

low power application,” IEEE International Symposium on Circuits and Systems, Vol. 4, pp.

4082–4085, 2005.

[9] B. Ramkumar and H. M. Kittur, “Low-power and area-efficient carry-select adder,” IEEE

Transactions on Very Large-Scale Integration (VLSI) Systems, Vol. 20, No. 2, pp. 371–375,

February 2012.

[10] I. C. Wey, C. C. Ho, Y. S. Lin, and C. C. Peng, “An area-efficient carry select adder design

by sharing the common Boolean logic term,” International Multi Conference of Engineers

and Computer Scientists (IMECS), March 2012.

74

[11] S. Manju and V. Sornagopal, “An efficient SQRT architecture of carry select adder design

by common Boolean logic,” International Conference on Emerging Trends in VLSI,

Embedded System, Nano Electronics and Telecommunication System (ICEVENT), January

2013.

[12] P. Devi, A. Girdher, and B. Singh, “Improved Carry Select Adder with Reduced Area and

Low Power Consumption,” International Journal of Computer Applications, Vol.3, No.4,

pp. 14-18, June 2010.

[13] B. Ramkumar, H. M. Kittur and P. M. Kannan, “ASIC Implementation of Modified Faster

Carry Save Adder,” European Journal of Scientific Research, Vol.42, No. 1, pp. 53-58,

2010.

[14] V. G. Oklobdzija, “High-Speed VLSI Arithmetic Units: Adders and Multipliers” Design of

High-Performance Microprocessor Circuits, IEEE press, 2000.

[15] J. E. Stine, “Digital Computer Arithmetic Data Path Design Using Verilog HDL,” Kluwer

academic publishers, 2004.

[16] Altera “DE2 Development and Education Board, user manual” Version 1.4, 2006.

[17] Handouts “Introduction to FPGAs” June 2009.

[18] L. Shanigarapu and B. P. Shrivastava, “Low-Power and High Speed Carry Select Adder,”

International Journal of Scientific and Research Publications, Volume 3, Issue 8, pp. 01-09,

August 2013.

[19] A. Mitra, A. Bakshi, B. Sharma and N. Didwania “Design of a High Speed Adder”

International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-

2015.

[20] A. N. M. Hossain, M. A. Abedin , “Implementation of an XOR Based 16-bit Carry Select

Adder for Area, Delay and Power Minimization”.International Conference on Electrical,

Computer and Communication Engineering (ECCE),CUET-2019.

[21] JISMI.T.A, NITHIN JOSE K, “An Area-Efficient Carry Select Adder Designed by Using

Transmission Gate,” International Journal of VLSI and Embedded Systems-IJVES Volume

06, pp 1519-1522, May 2015.

[22] K. Sanjay, V. V. Teresa,“Modified Full Adder Architecture For Area Efficient Carry Select

Adder,”International Journal of Engineering Research & Technology (IJERT),Vol. 2, pp.

968-972, May 2013.

75

[23] B. Ramkumar, H. M. Kittur and P. M. Kannan, “ASIC implementation of modified faster

carry save adder,” European Journal on Scientific Research,vol. 42, no. 1, pp. 53–58, 2010.

[24] Padma Devi, Ashima Girdher, Balwinder Singh,“Improved Carry Select Adder with

Reduced Area and Low Power Consumption,” International Journal of Computer

Applications (0975-8887), Volume 3, No.4, June 2010.

[25] Mariano Aguirre-Hernandez, Monico LinaresAranda,“CMOS Full-Adders for Energy-

Efficient Arithmetic Applications,” in IEEE Transactions On Very Large Scale Integration

(VLSI) Systems, Vol. 19, No. 4, April 2011.

[26] Guguloth Sreekanth,V Harish, D Mohammad Elias,“Design Of Low Power and Area

Efficient Carry Select Adder (CSLA) Using Verilog Language,” International Journal of

Engineering And Science, Vol.6, PP. 61-66, May 2016.

[27] B. Vijaya Lakshmi, B. Praveen Kumar,“Design of Modified Carry Select Adder with Low

Power and Efficient Area Using D-Latch,” International Journal of Innovative Research in

Science, Engineering and Technology, Vol. 7, February 2018.

[28] B. Ramkumar, H. M. Kittur and P. M. Kannan, “ASIC Implementation of Modified Faster

Carry Save Adder,” European Journal on Scientific Research, vol. 42, no. 1, pp. 53–58,

2010.

[29] B. Ramkumar and Harish M. Kittur, “Low-Power and Area-Efficient Carry Select

Adder,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 371–375,

vol. 20, no. 2, February 2012.

[30] Basant Kumar Mohanty and Sujit Kumar Patel, “ Area, Delay and Power Efficient Carry

Select Adder,” IEEE Transactions on Circuits and Systems-II:Express Briefs, vol. 61, no.

6, June 2014.

[31] U. Sajesh Kumar, K. Mohamed Salih and K. Sajith, “Deisgn and Implementation of Carry

Select Adder without using Mutilpexers,” 1st International Conference on Emerging

Technology in Electronics, Communication and Networking, 2012.

[32] K. Bala Sindhuri, K. Padma Vasavi, I. Santi Prabha and N. Udaya Kumar, “VLSI

Architecture for Linear Carry Select Adder with Zero Finding Logic,” 6th International

Advanced Computing Conference, pp. 31, February 2016.

[33] B. Tapasvi, K. Bala Sindhuri, I. Chaitanya Varma and N. Udaya Kumar, “Implementation

of 64 Bit KoggeStone Carry Select Adder with ZFC For Efficient Area,”IEEE International

Conference on Electrical, Computer & Communication Technology, SVS College of

Engineerng, Coimbatore, 5-7 March-2015.

Documents

Full Custom Layout Design and FPGA Implementation of an ... · In this thesis, a 16-bit carry select adder has been presented using modified XOR based full adder to reduce circuit