Upload
khawar
View
213
Download
1
Embed Size (px)
Citation preview
Abstract— As a consequence of technology shrinking, leakage
current has become a significant contributor to the overall power
dissipation of embedded memories. In this paper, we compare
design trade-offs of two leakage reduction techniques, namely the
diode clamp scheme and the replica cell biasing scheme. We show
how the two techniques compare using a 1V, 900MHz, 1kx32b
reference SRAM in 45nm technology with a data retention
voltage of 0.5V, which employs no leakage reduction scheme. The
performance comparison is presented over an operating
temperature range of -40°C to +130°C. We show that the replica
cell biasing scheme can achieve 85.9% reduction in leakage
current with an estimated gate area overhead of 2.3% plus area
of polysilicon resistors (per memory instance) together with a
speed reduction of 34.5% under most leaking conditions. The
figures are 84.8%, 2.2% and 23.7% respectively for the diode
clamp scheme.
Index Terms— SRAM, leakage, diode clamp, replica cell
biasing, low-power.
I. INTRODUCTION
HE challenges associated with SRAM design have
increased manifolds with technology scaling. Memories
have become leakier with reduction in device threshold
voltage and feature size, which has led to higher levels of
leakage power consumption. Numerous matrix leakage
reduction techniques have appeared in literature. Source
biasing, one of the most prominent, has been thoroughly
explored in [1]-[4]. Reverse body biasing has been
investigated in [3] and [5]. Dynamic voltage scaling has been
surveyed in [3] and [6]. The idea of floating bitlines has been
discussed in [7]. The benefits and trade-offs of using negative
wordline have been brought to light in [8]. The incorporation
of high-VT transistors has also been explored and analyzed in
[9].
This work presents a complete picture of the area-speed-
power trade-off associated with two leakage-reducing
techniques, namely the diode clamp (DC) scheme in [4] and
[10]-[13] and the replica cell biasing (RCB) scheme in [4].
The comparison is presented in 45nm predictive technology
Manuscript received Jun 14, 2010, revised Oct 6, 2010. Khawar Sarfraz is
with Electrical Engineering Department, Lahore University of Management
Sciences, Lahore 54792, Pakistan (e-mail: [email protected]).
[14] over an extended temperature range: -40°C to +130°C.
All simulations were carried out using HSPICE circuit
simulation software.
The layout of this paper is as follows. Section II provides an
architectural overview of the two leakage-reducing schemes.
In section III, we discuss the sizing and area overhead of the
switch while Section IV deals with the same for the two data
retention voltage (DRV) maintaining circuits. Section V is
related to leakage reduction and layout considerations. Section
VI presents the power-speed trade-off. Section VII reveals
some of the top-level design issues and presents a comparison
of results. The paper ends with a conclusion in Section VIII.
II. ARCHITECTURAL OVERVIEW
The DC scheme in Fig. 1 can be used to clamp VSSC to a
specific voltage level during standby (no read/write access). It
consumes no additional power, nor does it require on-chip
voltage generation or an additional supply voltage pin. Also,
the proposed modification helps reduce the total area
overhead. The downside of using a diode connected transistor
(DCT) is that its operation is not immune to Process-Voltage-
Temperature (PVT) variation.
Cell array
M1X
M1
blk_sel = 0
VDD
VSSC
DRV
VT,M1XM2
M3
INV1
Cell array
VSSC
VDD
N1
/SLP
N2
Fig. 1 Original DC scheme in [4] (left) and Modified DC scheme in standby
mode (right)
In the RCB scheme shown in Fig. 2, the voltage at node A1
is two PMOS thresholds below the supply and is transferred to
VSSC via P1’. Depending on PVT variation, either P1 or P2
sets the potential at VSSC during standby. The positive feature
of this approach is that the cell bias is maintained closer to
DRV level under PVT variation.
The two leakage reduction techniques are employed on a
1kx32b instance, split into eight 4k-bit blocks (64 rows by 64
columns). Each row in the block is driven by a common
Comparison of two SRAM matrix leakage
reduction techniques in 45nm technology
Khawar Sarfraz Department of Electrical Engineering, Lahore University of Management Sciences (LUMS),
Opp. Sector U, DHA, Lahore 54792, Pakistan
Email: [email protected]
T
22nd International Conference on Microelectronics (ICM 2010)
978-1-4244-5816-5/09/$26.00 ©2009 IEEE
wordline and all 64 cells within a column share the same
bitline pair. All cells in the block share a common NMOS
switch and a DRV-maintaining circuit, connected between the
source terminal of cell driver transistors (VSSC) and the real
ground. The NMOS switch is controlled by a block select
signal that is generated by a block decoder, allowing only one
block to be in the active (read/write) mode at any point in time.
Cell array
P1 M1
blk_sel = 0
VDD
P2
VDDVDD
VP1
VP2
P1' P2'
A1 A2
Replica
(load)Replica
(driver)
Bias
generator
VSSC
DRV
Fig. 2 RCB scheme [4] in standby mode
III. THE NMOS SWITCH
The NMOS switch (M1 and M1X in Fig. 1 and M1 in Fig.
2) must satisfy two requirements. First, it must be wide enough
to ensure near-ground potential (20mV for this work) at VSSC
during read in order to maintain sufficiently high static noise
margin. And second, it must swiftly discharge VSSC to ground
when the memory is being brought out of standby mode [15]
since that time contributes to the access time of the memory
and hence determines the speed penalty associated with both
architectures. The switch size is determined by the more
important former requirement. For the selected size, Fig. 3
shows the voltage at VSSC during read for Fast (FNFP),
Nominal (NOM) and Slow (SNSP) processes at increased
supply voltage. The required size, together with the switching
logic, translates to 2.2% of total memory block gate area.
14
16
18
20
-40 -20 0 20 40 60 80 100 120
Temp / deg C
V /
mV
FNFP SNSP NOM Target
Fig. 3 VSSC node voltage during read (1.1VDD)
IV. DATA RETENTION VOLTAGE MAITAINING CIRCUITS
As soon as the block select signal is switched off, VSSC
node assumes a floating state and (primarily) sub-threshold
leakage of cells in the block begins to charge it. Consequently,
the voltage at VSSC begins to rise. It is imperative that this
voltage level not be allowed to rise beyond VDD-DRV under
PVT variation or else the cells would lose their data.
Threshold voltage is a function of VDS (DIBL effect) and
temperature since the number of carriers in the channel area, in
sub- and above-threshold regions, is also temperature
dependant. Hence the DCT in the DC scheme as well as the
PMOS clamp devices (P1 and P2 in Fig. 2) in the RCB scheme
must be sized at 0.9VDD so that DRV of 0.5V (VSSC ≤ 0.4V)
could be ensured foremost at reduced supply voltage. Then, as
the supply voltage increases, it would be expected that the
voltage at VSSC would also rise to 0.5VMAX in case of 1V
supply and to 0.6VMAX with a 1.1V supply (ensuring 0.5V
DRV in each case). Fig. 4 shows the voltage at VSSC plotted
as a function of temperature in standby mode, with the DCT
and PMOS clamp devices sized as per the theory above.
Curves are shown for Fast, Nominal, Slow, Fast NMOS Slow
PMOS (FNSP), and Slow NMOS Fast PMOS (SNFP)
processes at 1.1VDD. In four out of five processes (including
the FAST process), the RCB scheme is able to maintain a
lower cell bias. The results shown are also consistent with the
DCT theory presented in Section II.
320
340
360
380
400
420
440
-40 -20 0 20 40 60 80 100 120
Temp / deg C
V /
mV
SNFP SNSP NOM FNSP FNFP
Diode connected Replica cell biasing
Fig. 4 VSSC node voltage during standby (1.1VDD)
V. LAYOUT CONSIDERATIONS AND LEAKAGE REDUCTION
In Fig. 1, the proposed modification suggests that the DCT
must be implemented in layout as part of the larger NMOS
switch. The switching logic would be the only overhead then.
In Fig. 2, the bias generator together with the PMOS clamp
devices take up 0.9% of total memory block gate area. An
important layout consideration is to use copies of SRAM cell
driver transistor to realize the total width of the NMOS switch
and the DCT. Similarly, copies of the cell load transistor
should be used to realize the total width of the PMOS clamp
devices. That would ensure equal PVT variation amongst the
memory block, the switch and the DRV-maintaining circuit, as
suggested in [4] and [10]. It would also help prevent current
crowding, since the read current of the accessed row would
have multiple paths to flow into the NMOS switch thus
preventing localized overheating. Non-uniform flow of read
current from the memory block into the NMOS switch can
speed up the creation of voids.
It can be concluded from the results shown in Fig. 4 that the
RCB scheme is capable of maintaining a lower cell bias in
most leaking (ML: FNFP, T=130°C, 1.1VDD), Nominal
(NOM, T=25°C, VDD), and least leaking (LL: SNSP, T=-40°C,
0.9VDD) corners. Hence the RCB scheme achieves greater
reduction in block leakage current, as shown in Table I.
TABLE I
PERCENTAGE LEAKAGE REDUCTION
Corner RCB scheme DC scheme
ML: T=130°C, 1.1VDD, FNFP 85.9 % 84.8 %
NOM: T=25°C, VDD, NOM 74.7 % 73.5 %
LL: T=-40°C, 0.9VDD, SNSP 73.6 % 72.4 %
VI. POWER-FREQUENCY TRADE-OFF
In both architectures, the power savings come from the
memory block due to reduction in block leakage current during
standby. In the DC scheme, power is consumed in driving the
NMOS switch (every time a memory block is accessed) and in
INV1 (Fig. 1) every time a block transitions from active to
standby mode. In the RCB scheme, power is consumed by the
bias generator circuit in addition to the NMOS switch. Fig. 5
shows the power savings of a 32kb instance together with the
power consumption of the switch and inverter INV1, plotted as
a function of memory access frequency for the DC scheme.
1.E-07
1.E-06
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09
Frequency / Hz
Po
wer
/ W
ML: T=130°C, 1.1VDD, FNFP LL: T=–40°C, 0.9VDD, SNSP NO M: T=25°C, VDD, NO M
Memory instance power savings
Switch and INV1
power consumption
1
2
3
Fig. 5 Power-frequency trade-off for the DC scheme (32kb instance)
The power-frequency model used in the analysis here was
developed in [16]-[17]. The analysis is presented for an
architecture where consecutive reads/writes take place in
sequentially accessed blocks. The solid and the dotted lines
correspond to power savings and power consumption
respectively. For the memory instance, power savings are
lowest at high frequencies because sufficient time is not
available to VSSC node to float up to the eventual standby
potential. With decreasing memory access frequency, more
standby time is available to the memory blocks, which allows
leakage currents to reduce by a greater amount, leading to
greater power savings [17]. Below a certain critical frequency,
the plots become flat since maximum reduction in leakage
currents has been achieved. The graphs for the NMOS switch
and inverter INV1 are straight lines since power consumed
increases the faster the memory block is switched.
The point of intersection of the two corresponding curves
indicates the break-even frequency, at which the power savings
from the memory instance equal the power consumption in the
switch and the inverter (Fig. 1). Operating the memory at a
frequency beyond the point of intersection is not practical.
Fig. 6 shows the power-frequency trade-off for the RCB
scheme. At low frequencies, the continuously on bias
generator circuits (one per memory block) dominate the power
expenditure, whereas at high frequencies the energy consumed
in switching the NMOS switch dominates.
1.E-07
1.E-06
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09
Frequency / Hz
Po
wer /
W
ML: T=130°C, 1.1VDD, FNFP LL: T=–40°C, 0.9VDD, SNSP NO M: T=25°C, VDD, NOM
Memory instance power savings
Switch & bias generator
power consumption
1
2
3
Fig. 6 Power-frequency trade-off for the RCB scheme (32kb instance)
Table II lists the breakeven frequency (numbered dots in
Fig. 5 and Fig. 6) as well as reduction in maximum operating
frequency (compared to the 900 MHz reference SRAM). Due
to the additional source diffusion capacitance at VSSC due to
the PMOS clamp devices in the RCB scheme, the NMOS
switch takes a little longer to discharge VSSC to ground when
switching a block from standby to active mode. The increase
in access time therefore results in lower maximum operating
frequency for the RCB scheme. It is clear from the data in
Table II that for a 32kb instance, both memory architectures
become slow, particularly in the Nominal and LL corners.
TABLE II
BREAKEVEN FREQUENCY AND PERCENTAGE REDUCTION IN MAXIMUM
OPERATING FREQUENCY
ML (1) NOM (2) LL (3)
RCB
scheme
527 MHz
(34.5% ↓)
123 MHz
(17.3% ↓)
60 MHz
(18.2% ↓)
DC
scheme
511 MHz
(23.7% ↓)
120 MHz
(16.0% ↓)
58 MHz
(17.2% ↓)
In order to improve on this scenario, we can increase the
size of the memory instance and keep the NMOS switch on
over multiple block accesses (representing cache block
transfer instead of word transfer). Fig. 7 illustrates the power-
frequency curves for the RCB scheme for a 64kb instance in
which the NMOS switch is kept on for 64 consecutive accesses
to a block. The power expenditure at lower frequencies is now
higher due to an increased number of bias generator circuits.
However the power consumption at high frequencies has now
been reduced, since the block is switched once every 64
memory accesses. It is clear that with these parameters, the
memory can be operated at maximum frequency in all corners
and power savings can still be achieved in comparison to the
reference architecture. Since leakage current reduces the most
in the ML corner, maximum power savings are also possible in
that corner. It can therefore be concluded that both memory
architectures are basically suited to high-temperature
applications.
1.E-07
1.E-06
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09
Frequency / Hz
Po
wer
/ W
ML: T=130°C, 1.1VDD, FNFP LL: T=–40°C, 0.9VDD, SNSP NOM: T=25°C, VDD, NO M
Memory instance power savings
Switch & bias generator
power consumption
Fig. 7 Power-frequency trade-off for the RCB scheme (64kb instance)
VII. HIGH-LEVEL DESIGN ISSUES AND RESULTS COMPARISON
Dividing the memory instance into blocks helps reduce
power consumption both during the active and standby modes.
The relevance of the two schemes is shown to be limited to
large memory instances, with a bit of change to the software
such that multiple accesses are made to the same block before
hopping on to the next. This way, memory access time is
higher only for the first access, not for subsequent accesses to
the same block. Also, scheduling algorithms that rely on the
percentage of time that a cache is placed in standby would
need to be tailored to work optimally on the two schemes
compared in this work, since maximum power savings are not
achieved immediately upon entering standby mode.
A comparison of key results from this work is presented
with those in related work in Table III.
TABLE III
RESULTS COMPARISON (NOMINAL CORNER)
RCB
scheme
(this
work)
DC
scheme
(this
work)
RCB
scheme
[4]
Gated-
GND
cache
[11]
Diode
footed
cache
[13]
DC
scheme
with
resistor
[12]
Technology 45nm 45nm 90nm 70nm 70nm 130nm
Percentage
area
overhead
2.3 +
area of
resistors
2.2 3.0 4.0 7.0 ~ 0
Percentage
access time
penalty
17.0 16.0 7.0 5.0 2.5 -
Percentage
leakage
reduction
74.7 73.5 88.0 51.0 65.8 95.0
VIII. CONCLUSION
We have revealed the area-speed-power trade-off associated
with two prominent leakage-reducing schemes in 45nm
predictive technology. Results show that the RCB scheme
achieves more leakage reduction (85.9%) at the cost of greater
chip area (2.3% plus area of polysilicon resistors per memory
instance) with 34.5% reduction in speed under ML conditions.
In essence, both schemes can achieve significant reduction in
block leakage current and their suitability to high-temperature
applications is demonstrated.
REFERENCES
[1] Wang, Y et al., “A 1.1GHz 12µA/Mb-Leakage SRAM Design in 65nm
Ultra-Low-Power CMOS with Integrated Leakage Reduction for
Mobile Applications,” Solid-State Circuits Conference ISSCC 2007, pp.
323-325.
[2] Kyeong-Sik Min, Kanda, K., Sakurai, T., “Row-by-row dynamic source-
line voltage control (RRDSV) scheme for two orders of magnitude
leakage current reduction of sub-1-V-VDD SRAM's,” Proceedings of the
2003 International Symposium on Low Power Electronics and Design,
pp. 66-71.
[3] Chung-Hsien Hua, Tung-Shuan Cheng, Wei Hwang, “Distributed data-
retention power gating techniques for column and row co-controlled
embedded SRAM,” IEEE International Workshop on Memory
Technology, Design, and Testing 2005, pp. 129-134.
[4] Takeyama, Y et al., “A low leakage SRAM macro with replica cell
biasing scheme,” IEEE Journal of Solid-State Circuits 2006, Vol. 41,
Issue 4, pp. 815 – 822.
[5] Ya-Chun Lai, Shi-Yu Huang, “X-Calibration: A Technique for
Combating Excessive Bitline Leakage Current in Nanometer SRAM
Designs,” IEEE Journal of Solid-State Circuits 2008, Vol. 43, Issue 9,
pp. 1964-1971.
[6] M. Khellah et al., “A 256-kb dual-Vcc SRAM building block in 65-nm
CMOS process with actively clamped sleep transistor,” IEEE Journal
of Solid State Circuits 2007, Vol. 42, Issue 1, pp. 233-242.
[7] Fujita, K et al., “Array architecture of floating body cell (FBC) with
quasi-shielded open bit line scheme for sub-40nm node,” IEEE
International SOI Conference 2008, pp. 31-32.
[8] Chua-Chin Wang, Ching-Li Lee, Wun-Ji Lin, “A 4-kb Low-Power
SRAM Design With Negative Word-Line Scheme,” IEEE Transactions
on Circuits and Systems 2007, Regular Papers, Vol 54, Issue 5, pp.
1069-1076.
[9] Mukhopadhyay, S., Keejong Kim, Mahmoodi, H., Roy, K., “Design of a
Process Variation Tolerant Self-Repairing SRAM for Yield
Enhancement in Nanoscaled CMOS,” IEEE Journal of Solid-State
Circuits 2007, Vol. 42, Issue 6, pp. 1370-1382.
[10] Ding-Ming Kwai, “Standby Current Reduction of Compilable SRAM
Using Sleep Transistor and Source Line Self Bias,” IEEE Asian Solid-
State Circuits Conference 2006, pp. 23-26
[11] Agarwal, A., Li, H., Roy, K., “A Single-Vt Low-Leakage Gated-Ground
Cache for Deep Submicron,” IEEE Journal of Solid State Circuits
2003, Vol. 38, Issue 2, pp. 319-328
[12] Masanao Yamaoka et al, “A 300-MHz 25-µA/Mb-Leakage On-Chip
SRAM module featuring process-variation immunity and low-leakage-
active mode for mobile-phone application processor,” IEEE Journal of
Solid State Circuits 2005, Vol. 40, Issue 1, pp. 186-194
[13] Agarwal, A., Roy, K., “A Noise Tolerant Cache Design to Reduce Gate
and Sub-threshold Leakage in the Nanometer Regime,” Proceedings of
the 2003 International Symposium on Low Power Electronics and
Design, pp. 18-21
[14] Predictive Technology Model (PTM). Available: http://ptm.asu.edu/
[15] Huang, S. et al “A Novel SRAM Structure for Leakage Power
Suppression in 45nm Technology,” International Conference on
Communications, Circuits and Systems 2008, pp. 1070-1074
[16] Jiang, H. et al “Benefits and Costs of Power-Gating Technique,”
Proceedings of the 2005 International Conference on Computer
Design, pp. 559-566
[17] Sarfraz, K., van der Meijs, N.P., Doorn, T.S., Salters, R.W., “SRAM
power reduction: An ultra-low power SRAM architecture in 45nm
technology,” Master Thesis 2009, TU Delft Institutional Repository.
Available:
http://repository.tudelft.nl/search/ir/?q=khawar+sarfraz&w=Publications
&faculty=&department=&type=&year=