5
Low power and fast adder implementation with Double Gate MOSFETs B.Vignesh 1 , P.sujith 2 [email protected] 1 ,[email protected] 2 1,2 UG cholar,!epartment o" #$#, ri %amakrishna &nstitute o" technology, $oimbatore. Abstract —In this paper we present implementation of a 32bit adder usin! "uad #arr$ Loo% &head'"#L&( al!orithm in )ompound domino lo!i) with Mer!ed *re)har!e +eeper transistor and Statisti)all$ S%ewed In,erter with Double Gate MOSFET'DGMOSFET(s- The worst )ase propa!ation dela$ of the adder is 22.ps- The a,era!e operatin! power is /01 - Index Terms — DGMOS4 )arr$loo%ahead "#L&4 domino lo!i)4 propa!ation dela$4 power )onsumption- '.(.)ayanthi * jaynathimuthuraman@re+i""mail.com * 3 Assistant Professor, Department of ECE, Sri Ramakrishna Institute of technology, Coimbator on !G - /#0 +esigne+ an+ "abricate+ at #0&, /rance *3 an+ /ig. 1 sho4s a 0#- microphotograph o" a planar !G transistor "abricate+ in the process. &. &(0% !U$0& ( or the past several years emicon+uctor &n+ustry has been "ollo4ing -oore5s la4 o" scaling. 'ccor+ing to -oore5s la4, the per"ormance, improves by *67, the number o" transistors on a chip +oubles roughly every 18 to 29 months 13. :o4ever the semicon+uctor in+ustry is no4 "acing a lot o" challenges such as, high po4er +ensity, +evice reliability, etc, as it scales +o4n to the lo4er nano si;es<less than 9=nm no+e>. :ence, innovative +evice structures are nee+e+ to cope up 4ith some o" these challenges, an+ to support high per"ormance an+ lo4 po4er applications. 0he +ouble gate - /#0 <!G - /#0> +evice is one o" the most promising can+i+ates "or replacing conventional - /#0 in to+ay5s state o" the art chips, in the near "uture 23. !ue to their smaller si;e, controllable threshol+ voltage an+ re+uce+ po4er consumption, they are very e""ective +evices "or high per"ormance an+ lo4 po4er +igital circuits. /ast an+ e""icient a++ers are essential "or high per"ormance micro processors. uch a++er architectures typically utili;e parallelism an+ +ynamic logic to achieve "ast computation. :ere 4e +esign our a++er by utili;ing the concept o" in+epen+ent gate control o" !G- 4hich is applie+ to +omino circuits. rgani;ation o" paper is as "ollo4s? 0he "ully +eplete+ </!> +ouble gate <!G> silicon on insulator < &> technology 4ith planar in+epen+ent sel" aligne+ gates is brie"ly e plaine+ in ec &&. Aua+ $arry ook 'hea+ <A$ '> a++er algorithm an+ its implementation is given in ec &&& an+ ec &V respectively. !$V base+ a++er cell is +iscusse+ in ec V an+ the results are presente+ in ec V&. &&. ! UB # G'0# /! & 0#$:( G Ce note that there have been a "e4 +i""erent +ouble gate structures reporte+ in the literature. 0he present 4ork is base+

Low Power and Fast Adder Implementation

  • Upload
    forece

  • View
    221

  • Download
    0

Embed Size (px)

DESCRIPTION

low power fast adder

Citation preview

Low power and fast adder implementationwith Double Gate MOSFETs

B.Vignesh1, P.sujith2 [email protected],[email protected] 1,2UG Scholar,Department of ECE, Sri Ramakrishna Institute of technology, Coimbatore.

AbstractIn this paper we present implementation of a 32-bitadder using Quad Carry Look Ahead(QCLA) algorithm incompound domino logic with Merged Pre-charge Keepertransistor and Statistically Skewed Inverter with Double GateMOSFET(DGMOSFET)s. The worst case propagation delay ofthe adder is 220ps. The average operating power is 186 W.

Index Terms DGMOS, carry-look-ahead QCLA, dominologic, propagation delay, power consumption.

[email protected] Professor, Department of ECE, Sri Ramakrishna Institute of technology, Coimbator

on DG-MOSFET designed and fabricated at LETI, France [3]and Fig. 1 shows a TEM microphotograph of a planar DGtransistor fabricated in the process.

I.

INTRODUCTION

or the past several years Semiconductor Industry has beenfollowing Moore's law of scaling. According to Moore'slaw, the performance, improves by 30%, the number oftransistors on a chip doubles roughly every 18 to 24 months[1].However the semiconductor industry is now facing a lot ofchallenges such as, high power density, device reliability, etc,as it scales down to the lower nano sizes(less than 45nmnode). Hence, innovative device structures are needed to copeup with some of these challenges, and to support highperformance and low power applications. The double gateMOSFET (DG-MOSFET) device is one of the most promisingcandidates for replacing conventional MOSFET in today'sstate-of-the-art chips, in the near future [2]. Due to theirsmaller size, controllable threshold voltage and reduced powerconsumption, they are very effective devices for highperformance and low power digital circuits. Fast and efficientadders are essential for high performance micro processors.Such adder architectures typically utilize parallelism anddynamic logic to achieve fast computation. Here we designour adder by utilizing the concept of independent gate controlof DGMOS which is applied to domino circuits. Organizationof paper is as follows: The fully depleted (FD) double gate(DG) silicon-on-insulator (SOI) technology with planarindependent self-aligned gates is briefly explained in Sec II.Quad Carry Look Ahead (QCLA) adder algorithm and itsimplementation is given in Sec III and Sec IV respectively.DCVSL based adder cell is discussed in Sec V and the resultsare presented in Sec VI.

II. DOUBLE GATE FDSOI TECHNOLOGY

We note that there have been a few different double-gatestructures reported in the literature. The present work is based

23

Fig. 1 TEM cross section of a double-metal-gate transistor [3] and thecorresponding DG NMOS symbol.

The essential features of a FD SOI DGMOS are a uniformand thin silicon channel, thick source drain regions andaligned top and bottom gates. There are two main types of DGMOSFETs: The symmetric double gate (SDG) device withboth gates of identical work functions and gate oxide thicknessand the asymmetric double gate (ADG) device with differentgate work functions and different gate oxide thickness. Lin etal. investigated the circuit performance of these two devices[4] and reported that the driving current in SDG is higher thanthe driving current in the ADG due to the inversion chargedifference for the same threshold (leading to betterconductivity). Furthermore since the electric field is lower inthe SDG, the mobility of the carriers is higher which directlyimpacts the current. The higher mobility is also responsible forthe SDG having a lower delay and hence the SDG is preferredover ADG design of logic circuits. We have used the devicegiven in [5] and the supply voltage is maintained at 1.2V forour adder implementation.

III. QCLA ALGORITHM

Usually, fast adders are implemented using a CarryLook Ahead algorithm which uses the traditional generate andpropagate terms [6]. If ai and bi are the input operands,pi and g i are propagate and generate signals respectivelythen sum bits, Si , can be described by the following equationsI (0, i) pi pi1 pi2 ........... p0 ,G(0, i) gi piG(0, i 1); Si ai bi ci ;F

Where G(0, i) and I (0, i) denotes the group generate andgroup propagate signals respectively for a group of bits fromposition 0 to i. The quantity that is propagated to the nextstage is the Carry-out at bit i . Block diagram of 16 bit QCLAis given in Fig. 2 in which we need binary, ternary and quadconvergences to provide best compromise between delay andpower consumption. So we need three types of cells whoselogic equations are given in [7]. Below are the equations for a4-bits adder

PG2:I(0,1)=p1p0 and G(0,1)= g1 +g0p0PG3: I(0,2)=p2p1p0 and G(0,2)= g 2 +g1p2 + g0p2p1PG4: I(0,3)=p3p2pp0 andG(0,3)= g3 +g 2p3 +g1p3p2 + g 0p3p2p1.

Lings equations [8] are an alternative to the Classical CLA,by identifying pi gi gi , the generate term G(0, i) , can bereformulated as

G(0, i) pi ( gi G(0, i 1)) pi H (0, i 1)

In Lings adder, the pseudo-carry H i is propagated, andcombined with the remaining terms in the final sum:

H (0, i) gi pi1H (0, i1);Si pi H (0, i) gi pi1H (0, i1).

The advantage of using Lings equations comes afterexpanding the recursions [9]. For instance, expanding therecursions of H (0, i) for a group of 4 bits results in

H (0, 3) g3 g 2 g1 p2 g 0 p1 p2 .

The H(0,3) term has fewer factors than G(0,3), which inCMOS requires fewer transistors in the stack of the first gate.However, the sum computation when using Lings pseudo-carry equations is more complex. So Lings equationseffectively move complexity from the carry tree into the sum-pre-compute block [10] which is not in the critical path.

IV. IMPLEMENTATION OF QCLA with DGMOS

Meng et al. [11] proposed novel DG circuit techniques forNAND, NOR etc. which reduced the area as well as the powerresulting in improved performance. These are the most basictechniques when it comes to DGMOS and are widely used. In[12] NAND gate circuits with reduced stack have beenproposed. These circuits achieve higher density due toapplication of different threshold voltages for NMOS andPMOS devices. We have implemented our basics cells withdominos logic, compound domino and compound dominowith stack height reduction wherever necessary. In [13]domino logic circuits have been developed using DGMOS

24

with Merged Pre-charge Keeper(MPK) with Statisticallyskewed inverter(SSI) and MPK with Dynamically SkewedInverter(DSI). As shown in Fig. 3 and 4, I(0,3) and H(0,3) areimplemented in compound domino configuration where asH(0,15) is implemented in compound domino with stackheight reduction for better performance since these blocks arein the critical path. Our conventional cells are domino gateswith keeper transistor.

Fig. 2 Block diagram of 16-bit QCL Adder.

The advantage of these topologies is that from a single cellwe will get G01, G02 and H03 which are needed to implementPG1, PG2 and PG3. As given in [14], implementation usingcompound domino gate gives speed improvement than adynamic gate. For Implementing I(0,3) we are effectivelyutilizing the property of DGMOS as given in [11] such thatwhen two MOSFETS are parallel we can group them into oneand replace with one DGMOS. While generating the termsI01, I02 and I03, four transistors can be saved compared toCMOS implementation of same circuit, and we are carefullypre-charging the intermediate nodes to get the output, whichprevents the charge sharing problem of dynamic gates. Weimplemented the term H (0,15) using compound domino withstack height reduction to improve the speed as given in [15].1

V. DCVSL ADDER CELL

Differential Cascade Voltage Switch Logic (DCVSL)family is similar to the Pseudo NMOS in the way that it alsohas all the logic implemented only in the PDN and PMOS arepresent in the form of load transistors [16] (which are now in alatch type configuration.). The speed is high as the switchingis done through NMOS and the logic can be condensed whenthere are common terms in both the trees. Adding a PMOSsleep transistor from VDD on top of the circuit we havedeveloped a 1 bit full adder in DGMOS whose transistor levelschematic is given in Fig. 5. For this adder cell, we assume theinput and its complement are present. In most practicalimplementations the compliment is made available by a chainof buffers. We have compared this adder design with astandard 28 gate full adder (without XOR configuration) [17]designed in DGMOS technology with channel length of 25nmwith the double gate optimization. The results are depictedbelow in the form of chart in Fig. 6. From these we see thatthe DCVSL based adder is about 40% faster than theconventional adder and also due to the usage of the sleeptransistor, its leakage current is drastically lower than that ofits counterpart. Sum will be generated from this block usingthe below given equations with ai , bi and carry in as inputs

Fig. 3 Transistor level schematic of H(0,3) (a) Conventional and(b) MPK/SSI,MPK/DSI realized in compound domino logic.

Si ai bi G(0, i1) if Cin G(0, i -1)Si ai bi ( pi .H (0, i1)) if Cin H (0, i -1)

Fig. 5 DCVSL based one bit full adder (DCVSL_SUM) cellwith PMOS as sleep transistor.

Fig. 6 Comparison of two adder cells in terms of

Fig. 4

Transistor level schematic of (a)I(0,3) and (b)H(0,15) implementedwith compound domino with stack height reduction.

25

(i) Total no.of Transistors(ii) Average Propagation Dealy(ns)(iii) Leakage Current (nA).

VI. RESULTS AND DISCUSSION

We have implemented H(0,3) with three different dominoconfigurations - Conventional, MPK/SSI and MPK/DSI andthey are compared in terms of propagation delay (Tprop) forworst case delay input vector, power in the evaluation phase(Pe) after disconnecting the evaluation network from supplyand power in the pre-charge phase (Pp) after disconnecting theevaluation network from output, shown in Table 1. From theresults, it is clear that MPK/SSI shows good power delayperformance. MPK/DSI performs better in terms ofperformance and static power consumption but it consumesmore active power due to clock switching. We alsoimplemented 16-bit QCLA using the three configurationswhose results are given in Table II. For 16-bit QCLA, thepower consumption during pre-charge phase is nearly samefor the three configurations but there is a significant differencein power consumption during evaluation phase which is of

we implemented 32- bit QCLA adder with MPK/SSIcompound domino logic with stack height reduction and anovel DCVSL based cell is designed for generating sum at theoutput. We minimized the complexity by generating mix ofcarry and pseudo carry terms using H(0,3) and H(0,15) cellsand sum computation using DCVSL_SUM block whichresulted in minimized power and delay.

ACKNOWLEDGMENTThis work was funded by the Indo-French Centre for thePromotion of Advanced Research. The authors thank theLaboratorie dElectronique et de Technologie de lInformation(LETI) of the Commissariat lEnergie Atomique (CEA),Grenoble, France, for generously providing their circuitmodels for double-gate MOSFETs.

REFERENCES

interest. We have implemented a 32-bit adder using the abovegiven cells - H(0,3), I(0,3) and H(0,15) - with MPK/SSIconfiguration and DCVSL based sum block (Fig. 5) at theoutput to generate sum and CLK as the sleep signal. Since weare generating mix of carries and pseudo carries using H(0,3)and H(0,15) cells, we are able to minimize the complexity ofgenerating sum with our DCVSL_SUM cell which is a bettertradeoff between power and delay. We implemented a 32-bitQCLA with MPK/SSI configuration. In this case, the worstcase delay is 220 ps and the power consumed in the evaluationphase is 186 W. In our implementation, the blocks which getexternal inputs are footed so as to take care of nonmonotonicity of external inputs. We assume the leastsignificant bits to be more active than the most significant bitsso LSBs are always kept away from the output in order todecrease unnecessary discharges of the internal nodes.

TABLE IRESULTS OF H(0, 3) CELL FOR DIFERENT ARCHITECTURES

TABLE IIRESULTS OF 16-BIT QCLA FOR DIFERENT ARCHITECTURES

VII. CONCLUSION

We implemented 16-bit QCLA with conventional,MPK/SSI and MPK/DSI and observed that MPK/SSI is betterin terms of power dissipation and propagation delay. Hence

26

[1]

[2][3]

[4]

[5]

[6]

[7]

[8][9][10]

[11]

[12]

[13]

[14][15][16]

[17]

S.Borkar, "Design perspectives on 22nm CMOS and beyond,"IEEE/ACM 46th Annual Design Automation Conference., SFO -California, pp. 93-94, July 2009.A. Amara and O. Rozeau (eds.), Planar double-gate transistor: Fromtechnology to circuit. Dordrecht. Springer, 2009.M. Vinet et al., Bonded Planar Double-Metal-Gate NMOS transistorsdown to 10 nm, IEEE Electron Devices Let., vol. 26, No. 5, p. 317-319, May 2005.C.-H. Lin, P. Su, Y. T a d , X. Xi, I. He, A. M. Niknejad, M. Chan, andC.Hu,Circuit Performance of Double-Gate SOI CMOS,Semiconductor Device Research Symposium, pp. 266-267, Dec 2003.B. Giraud , A.Amara and A. Vladimirescu, A Comparative Study of6T and 4T SRAM Cells in Double-Gate CMOS with StatisticalVariation, IEEE International Symposium on circuits and systems.,pp. 3022 3025, May 2007.J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital IntegratedCircuits: A Design Perspective, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 2003.P. Royannez and A. Amara, A 1.0ns 64-bits GaAs Adder using Quadtree algorithm, 6th Great Lakes Symposium on VLSI. , pp. 24 28, Mar1996.Ling, H,High Speed Binary Adder, IBM J. Research. Dev., Vol. 25,no.3, p.156, May 1981.R. W. Doran, Variants of an improved carry-lookahead adder,IEEETrans. Computers, vol. 37, pp. 11101113, Sep 1988.R. Zlatanovici, S. Kao, and B. Nikolic, EnergyDelay Optimization of64-Bit Carry-Lookahead Adders With a 240 ps 90 nm CMOS DesignExample, IEEE J. Solid-State Circuits, vol. 44, pp. 569 - 583, Feb2009.M. H. Chiang , K. Kim , C. Tretz and C. T. Chuang, "Novel high-densitylow-power high-performance double-gate logic technique," Proc. IEEEInt. SOI Conf., pp. 122, Oct 2004.M.-H. Chiang "High-density reduced-stack logic circuit techniquesusing independent-gate controlled double-gate devices," IEEETrans.Electron Devices., vol. 53, pp. 2370-2377, Aug 2006.H. Mahmoodi, et al., High-performance and low-power domino logicusing independent gate control in double-gate SOI MOSFETs,"Proc.IEEE Int. SOI Conf., pp. 67 - 68, Oct 2004.S. Naffziger, A sub-nanosecond 0.5m 64b adder design,International Solid-State Circuits Conference., pp. 210-211, Feb 1996.J. Park, H. C. Ngo, J. A. Silberman, and S. H. Dong, 470 ps 64 bitparallel binary adder, in Symp. VLSI Circuits., pp. 192193, Jun 2000.Kan M. Chu et al.,A comparison of CMOS circuit Techniques:Differential Cascode Voltage Switch Logic versus Conventional Logic,IEEE Journal of Solid State Circuits., vol. 22, pp.528-532, Aug. 1987.Yeo, Kiat Seng and Roy, Kaushik, Low Voltage, Low Power VLSISubsystems, New York, McGraw- Hill, 2005.Tprop(ps)Pe(evaluation)(W)Pp(pre-charge)(nW)Conventional722110MPK/SSI6511.611.7MPK/DSI636.46.3

Tprop(ps)Pe(evaluation)(W)Pp(pre-charge)(nW)Conventional1646542MPK/SSI1375041MPK/DSI1354341