[IEEE 2010 International Conference on Microelectronics (ICM) - Cairo, Egypt (2010.12.19-2010.12.22)] 2010 International Conference on Microelectronics - Comparison of two SRAM matrix

Abstract— As a consequence of technology shrinking, leakage

current has become a significant contributor to the overall power

dissipation of embedded memories. In this paper, we compare

design trade-offs of two leakage reduction techniques, namely the

diode clamp scheme and the replica cell biasing scheme. We show

how the two techniques compare using a 1V, 900MHz, 1kx32b

reference SRAM in 45nm technology with a data retention

voltage of 0.5V, which employs no leakage reduction scheme. The

performance comparison is presented over an operating

temperature range of -40°C to +130°C. We show that the replica

cell biasing scheme can achieve 85.9% reduction in leakage

current with an estimated gate area overhead of 2.3% plus area

of polysilicon resistors (per memory instance) together with a

speed reduction of 34.5% under most leaking conditions. The

figures are 84.8%, 2.2% and 23.7% respectively for the diode

clamp scheme.

Index Terms— SRAM, leakage, diode clamp, replica cell

biasing, low-power.

I. INTRODUCTION

HE challenges associated with SRAM design have

increased manifolds with technology scaling. Memories

have become leakier with reduction in device threshold

voltage and feature size, which has led to higher levels of

leakage power consumption. Numerous matrix leakage

reduction techniques have appeared in literature. Source

biasing, one of the most prominent, has been thoroughly

explored in [1]-[4]. Reverse body biasing has been

investigated in [3] and [5]. Dynamic voltage scaling has been

surveyed in [3] and [6]. The idea of floating bitlines has been

discussed in [7]. The benefits and trade-offs of using negative

wordline have been brought to light in [8]. The incorporation

of high-VT transistors has also been explored and analyzed in

[9].

This work presents a complete picture of the area-speed-

power trade-off associated with two leakage-reducing

techniques, namely the diode clamp (DC) scheme in [4] and

[10]-[13] and the replica cell biasing (RCB) scheme in [4].

The comparison is presented in 45nm predictive technology

Manuscript received Jun 14, 2010, revised Oct 6, 2010. Khawar Sarfraz is

with Electrical Engineering Department, Lahore University of Management

Sciences, Lahore 54792, Pakistan (e-mail: [email protected]).

[14] over an extended temperature range: -40°C to +130°C.

All simulations were carried out using HSPICE circuit

simulation software.

The layout of this paper is as follows. Section II provides an

architectural overview of the two leakage-reducing schemes.

In section III, we discuss the sizing and area overhead of the

switch while Section IV deals with the same for the two data

retention voltage (DRV) maintaining circuits. Section V is

related to leakage reduction and layout considerations. Section

VI presents the power-speed trade-off. Section VII reveals

some of the top-level design issues and presents a comparison

of results. The paper ends with a conclusion in Section VIII.

II. ARCHITECTURAL OVERVIEW

The DC scheme in Fig. 1 can be used to clamp VSSC to a

specific voltage level during standby (no read/write access). It

consumes no additional power, nor does it require on-chip

voltage generation or an additional supply voltage pin. Also,

the proposed modification helps reduce the total area

overhead. The downside of using a diode connected transistor

(DCT) is that its operation is not immune to Process-Voltage-

Temperature (PVT) variation.

Cell array

M1X

M1

blk_sel = 0

VDD

VSSC

DRV

VT,M1XM2

M3

INV1

Cell array

VSSC

VDD

N1

/SLP

N2

Fig. 1 Original DC scheme in [4] (left) and Modified DC scheme in standby

mode (right)

In the RCB scheme shown in Fig. 2, the voltage at node A1

is two PMOS thresholds below the supply and is transferred to

VSSC via P1’. Depending on PVT variation, either P1 or P2

sets the potential at VSSC during standby. The positive feature

of this approach is that the cell bias is maintained closer to

DRV level under PVT variation.

The two leakage reduction techniques are employed on a

1kx32b instance, split into eight 4k-bit blocks (64 rows by 64

columns). Each row in the block is driven by a common

Comparison of two SRAM matrix leakage

reduction techniques in 45nm technology

Khawar Sarfraz Department of Electrical Engineering, Lahore University of Management Sciences (LUMS),

Opp. Sector U, DHA, Lahore 54792, Pakistan

Email: [email protected]

T

22nd International Conference on Microelectronics (ICM 2010)

978-1-4244-5816-5/09/$26.00 ©2009 IEEE

wordline and all 64 cells within a column share the same

bitline pair. All cells in the block share a common NMOS

switch and a DRV-maintaining circuit, connected between the

source terminal of cell driver transistors (VSSC) and the real

ground. The NMOS switch is controlled by a block select

signal that is generated by a block decoder, allowing only one

block to be in the active (read/write) mode at any point in time.

Cell array

P1 M1

blk_sel = 0

VDD

P2

VDDVDD

VP1

VP2

P1' P2'

A1 A2

Replica

(load)Replica

(driver)

Bias

generator

VSSC

DRV

Fig. 2 RCB scheme [4] in standby mode

III. THE NMOS SWITCH

The NMOS switch (M1 and M1X in Fig. 1 and M1 in Fig.

2) must satisfy two requirements. First, it must be wide enough

to ensure near-ground potential (20mV for this work) at VSSC

during read in order to maintain sufficiently high static noise

margin. And second, it must swiftly discharge VSSC to ground

when the memory is being brought out of standby mode [15]

since that time contributes to the access time of the memory

and hence determines the speed penalty associated with both

architectures. The switch size is determined by the more

important former requirement. For the selected size, Fig. 3

shows the voltage at VSSC during read for Fast (FNFP),

Nominal (NOM) and Slow (SNSP) processes at increased

supply voltage. The required size, together with the switching

logic, translates to 2.2% of total memory block gate area.

14

16

18

20

-40 -20 0 20 40 60 80 100 120

Temp / deg C

V /

mV

FNFP SNSP NOM Target

Fig. 3 VSSC node voltage during read (1.1VDD)

IV. DATA RETENTION VOLTAGE MAITAINING CIRCUITS

As soon as the block select signal is switched off, VSSC

node assumes a floating state and (primarily) sub-threshold

leakage of cells in the block begins to charge it. Consequently,

the voltage at VSSC begins to rise. It is imperative that this

voltage level not be allowed to rise beyond VDD-DRV under

PVT variation or else the cells would lose their data.

Threshold voltage is a function of VDS (DIBL effect) and

temperature since the number of carriers in the channel area, in

sub- and above-threshold regions, is also temperature

dependant. Hence the DCT in the DC scheme as well as the

PMOS clamp devices (P1 and P2 in Fig. 2) in the RCB scheme

must be sized at 0.9VDD so that DRV of 0.5V (VSSC ≤ 0.4V)

could be ensured foremost at reduced supply voltage. Then, as

the supply voltage increases, it would be expected that the

voltage at VSSC would also rise to 0.5VMAX in case of 1V

supply and to 0.6VMAX with a 1.1V supply (ensuring 0.5V

DRV in each case). Fig. 4 shows the voltage at VSSC plotted

as a function of temperature in standby mode, with the DCT

and PMOS clamp devices sized as per the theory above.

Curves are shown for Fast, Nominal, Slow, Fast NMOS Slow

PMOS (FNSP), and Slow NMOS Fast PMOS (SNFP)

processes at 1.1VDD. In four out of five processes (including

the FAST process), the RCB scheme is able to maintain a

lower cell bias. The results shown are also consistent with the

DCT theory presented in Section II.

320

340

360

380

400

420

440

-40 -20 0 20 40 60 80 100 120

Temp / deg C

V /

mV

SNFP SNSP NOM FNSP FNFP

Diode connected Replica cell biasing

Fig. 4 VSSC node voltage during standby (1.1VDD)

V. LAYOUT CONSIDERATIONS AND LEAKAGE REDUCTION

In Fig. 1, the proposed modification suggests that the DCT

must be implemented in layout as part of the larger NMOS

switch. The switching logic would be the only overhead then.

In Fig. 2, the bias generator together with the PMOS clamp

devices take up 0.9% of total memory block gate area. An

important layout consideration is to use copies of SRAM cell

driver transistor to realize the total width of the NMOS switch

and the DCT. Similarly, copies of the cell load transistor

should be used to realize the total width of the PMOS clamp

devices. That would ensure equal PVT variation amongst the

memory block, the switch and the DRV-maintaining circuit, as

suggested in [4] and [10]. It would also help prevent current

crowding, since the read current of the accessed row would

have multiple paths to flow into the NMOS switch thus

preventing localized overheating. Non-uniform flow of read

current from the memory block into the NMOS switch can

speed up the creation of voids.

It can be concluded from the results shown in Fig. 4 that the

RCB scheme is capable of maintaining a lower cell bias in

most leaking (ML: FNFP, T=130°C, 1.1VDD), Nominal

(NOM, T=25°C, VDD), and least leaking (LL: SNSP, T=-40°C,

0.9VDD) corners. Hence the RCB scheme achieves greater

reduction in block leakage current, as shown in Table I.

TABLE I

PERCENTAGE LEAKAGE REDUCTION

Corner RCB scheme DC scheme

ML: T=130°C, 1.1VDD, FNFP 85.9 % 84.8 %

NOM: T=25°C, VDD, NOM 74.7 % 73.5 %

LL: T=-40°C, 0.9VDD, SNSP 73.6 % 72.4 %

VI. POWER-FREQUENCY TRADE-OFF

In both architectures, the power savings come from the

memory block due to reduction in block leakage current during

standby. In the DC scheme, power is consumed in driving the

NMOS switch (every time a memory block is accessed) and in

INV1 (Fig. 1) every time a block transitions from active to

standby mode. In the RCB scheme, power is consumed by the

bias generator circuit in addition to the NMOS switch. Fig. 5

shows the power savings of a 32kb instance together with the

power consumption of the switch and inverter INV1, plotted as

a function of memory access frequency for the DC scheme.

1.E-07

1.E-06

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09

Frequency / Hz

Po

wer

/ W

ML: T=130°C, 1.1VDD, FNFP LL: T=–40°C, 0.9VDD, SNSP NO M: T=25°C, VDD, NO M

Memory instance power savings

Switch and INV1

power consumption

1

2

3

Fig. 5 Power-frequency trade-off for the DC scheme (32kb instance)

The power-frequency model used in the analysis here was

developed in [16]-[17]. The analysis is presented for an

architecture where consecutive reads/writes take place in

sequentially accessed blocks. The solid and the dotted lines

correspond to power savings and power consumption

respectively. For the memory instance, power savings are

lowest at high frequencies because sufficient time is not

available to VSSC node to float up to the eventual standby

potential. With decreasing memory access frequency, more

standby time is available to the memory blocks, which allows

leakage currents to reduce by a greater amount, leading to

greater power savings [17]. Below a certain critical frequency,

the plots become flat since maximum reduction in leakage

currents has been achieved. The graphs for the NMOS switch

and inverter INV1 are straight lines since power consumed

increases the faster the memory block is switched.

The point of intersection of the two corresponding curves

indicates the break-even frequency, at which the power savings

from the memory instance equal the power consumption in the

switch and the inverter (Fig. 1). Operating the memory at a

frequency beyond the point of intersection is not practical.

Fig. 6 shows the power-frequency trade-off for the RCB

scheme. At low frequencies, the continuously on bias

generator circuits (one per memory block) dominate the power

expenditure, whereas at high frequencies the energy consumed

in switching the NMOS switch dominates.

1.E-07

1.E-06

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09

Frequency / Hz

Po

wer /

W

ML: T=130°C, 1.1VDD, FNFP LL: T=–40°C, 0.9VDD, SNSP NO M: T=25°C, VDD, NOM


Switch & bias generator

power consumption

1

2

3

Fig. 6 Power-frequency trade-off for the RCB scheme (32kb instance)

Table II lists the breakeven frequency (numbered dots in

Fig. 5 and Fig. 6) as well as reduction in maximum operating

frequency (compared to the 900 MHz reference SRAM). Due

to the additional source diffusion capacitance at VSSC due to

the PMOS clamp devices in the RCB scheme, the NMOS

switch takes a little longer to discharge VSSC to ground when

switching a block from standby to active mode. The increase

in access time therefore results in lower maximum operating

frequency for the RCB scheme. It is clear from the data in

Table II that for a 32kb instance, both memory architectures

become slow, particularly in the Nominal and LL corners.

TABLE II

BREAKEVEN FREQUENCY AND PERCENTAGE REDUCTION IN MAXIMUM

OPERATING FREQUENCY

ML (1) NOM (2) LL (3)

RCB

scheme

527 MHz

(34.5% ↓)

123 MHz

(17.3% ↓)

60 MHz

(18.2% ↓)

DC

scheme

511 MHz

(23.7% ↓)

120 MHz

(16.0% ↓)

58 MHz

(17.2% ↓)

In order to improve on this scenario, we can increase the

size of the memory instance and keep the NMOS switch on

over multiple block accesses (representing cache block

transfer instead of word transfer). Fig. 7 illustrates the power-

frequency curves for the RCB scheme for a 64kb instance in

which the NMOS switch is kept on for 64 consecutive accesses

to a block. The power expenditure at lower frequencies is now

higher due to an increased number of bias generator circuits.

However the power consumption at high frequencies has now

been reduced, since the block is switched once every 64

memory accesses. It is clear that with these parameters, the

memory can be operated at maximum frequency in all corners

and power savings can still be achieved in comparison to the

reference architecture. Since leakage current reduces the most

in the ML corner, maximum power savings are also possible in

that corner. It can therefore be concluded that both memory

architectures are basically suited to high-temperature

applications.

1.E-07

1.E-06

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09

Frequency / Hz

Po

wer

/ W

ML: T=130°C, 1.1VDD, FNFP LL: T=–40°C, 0.9VDD, SNSP NOM: T=25°C, VDD, NO M


Switch & bias generator

power consumption

Fig. 7 Power-frequency trade-off for the RCB scheme (64kb instance)

VII. HIGH-LEVEL DESIGN ISSUES AND RESULTS COMPARISON

Dividing the memory instance into blocks helps reduce

power consumption both during the active and standby modes.

The relevance of the two schemes is shown to be limited to

large memory instances, with a bit of change to the software

such that multiple accesses are made to the same block before

hopping on to the next. This way, memory access time is

higher only for the first access, not for subsequent accesses to

the same block. Also, scheduling algorithms that rely on the

percentage of time that a cache is placed in standby would

need to be tailored to work optimally on the two schemes

compared in this work, since maximum power savings are not

achieved immediately upon entering standby mode.

A comparison of key results from this work is presented

with those in related work in Table III.

TABLE III

RESULTS COMPARISON (NOMINAL CORNER)

RCB

scheme

(this

work)

DC

scheme

(this

work)

RCB

scheme

[4]

Gated-

GND

cache

[11]

Diode

footed

cache

[13]

DC

scheme

with

resistor

[12]

Technology 45nm 45nm 90nm 70nm 70nm 130nm

Percentage

area

overhead

2.3 +

area of

resistors

2.2 3.0 4.0 7.0 ~ 0

Percentage

access time

penalty

17.0 16.0 7.0 5.0 2.5 -

Percentage

leakage

reduction

74.7 73.5 88.0 51.0 65.8 95.0

VIII. CONCLUSION

We have revealed the area-speed-power trade-off associated

with two prominent leakage-reducing schemes in 45nm

predictive technology. Results show that the RCB scheme

achieves more leakage reduction (85.9%) at the cost of greater

chip area (2.3% plus area of polysilicon resistors per memory

instance) with 34.5% reduction in speed under ML conditions.

In essence, both schemes can achieve significant reduction in

block leakage current and their suitability to high-temperature

applications is demonstrated.

REFERENCES

[1] Wang, Y et al., “A 1.1GHz 12µA/Mb-Leakage SRAM Design in 65nm

Ultra-Low-Power CMOS with Integrated Leakage Reduction for

Mobile Applications,” Solid-State Circuits Conference ISSCC 2007, pp.

323-325.

[2] Kyeong-Sik Min, Kanda, K., Sakurai, T., “Row-by-row dynamic source-

line voltage control (RRDSV) scheme for two orders of magnitude

leakage current reduction of sub-1-V-VDD SRAM's,” Proceedings of the

2003 International Symposium on Low Power Electronics and Design,

pp. 66-71.

[3] Chung-Hsien Hua, Tung-Shuan Cheng, Wei Hwang, “Distributed data-

retention power gating techniques for column and row co-controlled

embedded SRAM,” IEEE International Workshop on Memory

Technology, Design, and Testing 2005, pp. 129-134.

[4] Takeyama, Y et al., “A low leakage SRAM macro with replica cell

biasing scheme,” IEEE Journal of Solid-State Circuits 2006, Vol. 41,

Issue 4, pp. 815 – 822.

[5] Ya-Chun Lai, Shi-Yu Huang, “X-Calibration: A Technique for

Combating Excessive Bitline Leakage Current in Nanometer SRAM

Designs,” IEEE Journal of Solid-State Circuits 2008, Vol. 43, Issue 9,

pp. 1964-1971.

[6] M. Khellah et al., “A 256-kb dual-Vcc SRAM building block in 65-nm

CMOS process with actively clamped sleep transistor,” IEEE Journal

of Solid State Circuits 2007, Vol. 42, Issue 1, pp. 233-242.

[7] Fujita, K et al., “Array architecture of floating body cell (FBC) with

quasi-shielded open bit line scheme for sub-40nm node,” IEEE

International SOI Conference 2008, pp. 31-32.

[8] Chua-Chin Wang, Ching-Li Lee, Wun-Ji Lin, “A 4-kb Low-Power

SRAM Design With Negative Word-Line Scheme,” IEEE Transactions

on Circuits and Systems 2007, Regular Papers, Vol 54, Issue 5, pp.

1069-1076.

[9] Mukhopadhyay, S., Keejong Kim, Mahmoodi, H., Roy, K., “Design of a

Process Variation Tolerant Self-Repairing SRAM for Yield

Enhancement in Nanoscaled CMOS,” IEEE Journal of Solid-State

Circuits 2007, Vol. 42, Issue 6, pp. 1370-1382.

[10] Ding-Ming Kwai, “Standby Current Reduction of Compilable SRAM

Using Sleep Transistor and Source Line Self Bias,” IEEE Asian Solid-

State Circuits Conference 2006, pp. 23-26

[11] Agarwal, A., Li, H., Roy, K., “A Single-Vt Low-Leakage Gated-Ground

Cache for Deep Submicron,” IEEE Journal of Solid State Circuits

2003, Vol. 38, Issue 2, pp. 319-328

[12] Masanao Yamaoka et al, “A 300-MHz 25-µA/Mb-Leakage On-Chip

SRAM module featuring process-variation immunity and low-leakage-

active mode for mobile-phone application processor,” IEEE Journal of

Solid State Circuits 2005, Vol. 40, Issue 1, pp. 186-194

[13] Agarwal, A., Roy, K., “A Noise Tolerant Cache Design to Reduce Gate

and Sub-threshold Leakage in the Nanometer Regime,” Proceedings of

the 2003 International Symposium on Low Power Electronics and

Design, pp. 18-21

[14] Predictive Technology Model (PTM). Available: http://ptm.asu.edu/

[15] Huang, S. et al “A Novel SRAM Structure for Leakage Power

Suppression in 45nm Technology,” International Conference on

Communications, Circuits and Systems 2008, pp. 1070-1074

[16] Jiang, H. et al “Benefits and Costs of Power-Gating Technique,”

Proceedings of the 2005 International Conference on Computer

Design, pp. 559-566

[17] Sarfraz, K., van der Meijs, N.P., Doorn, T.S., Salters, R.W., “SRAM

power reduction: An ultra-low power SRAM architecture in 45nm

technology,” Master Thesis 2009, TU Delft Institutional Repository.

Available:

http://repository.tudelft.nl/search/ir/?q=khawar+sarfraz&w=Publications

&faculty=&department=&type=&year=

Documents

[IEEE 2010 International Conference on Microelectronics (ICM) - Cairo, Egypt (2010.12.19-2010.12.22)] 2010 International Conference on Microelectronics - Comparison of two SRAM matrix