8
High-Performance SRAM in Nanoscale CMOS: Design Challenges and Techniques Ching-Te Chuang, Saibal Mukhopadhyay, Jae-Joon Kim, Keunwoo Kim, and Rahul Rao IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, U. S. A. Phone: +1-914-945-3596, Fax: +1-914-945-1358, e-mail: [email protected] Abstract This paper reviews the design challenges and techniques of high-performance SRAM in the “End of Scaling” nanoscale CMOS technologies. The impacts of technology scaling, such as signal loss due to leakage, degradation of noise margin due to V T scatter caused by process variations and random dopant fluctuation, and long term reliability degradation such as NBTI, are addressed. Design directions and leakage/variation/degradation tolerant SRAM circuit techniques to mitigate various performance and reliability constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell isolation and strength preservation, thin cell layout, bit-line and word-line leakage mitigation, migration to large signal read-out, unclamped bit-line, dual-supply, dynamic Read/Write supply, floating power-line, header/footer power-gating structures, Read- and Write-assist circuits, leakage/variation detection and compensation techniques, word-line and bit-line pulsing schemes, gate leakage tolerant design, and NBTI tolerant design. Alternative cell structures, such as asymmetrical SRAM, 7T, and 8T SRAMs, which decouple the cell storage node from the Read-disturb and half-select disturb to improve the SNM are discussed. Finally, some design issues and opportunities in emerging technologies such as FD/SOI and multi-gate FinFET are illustrated. 1. Technology Scaling and SRAM Design Challenges High-performance SRAMs are crucial elements for high- performance microprocessor cache memory and SoC applications. Due to large number of transistor count, the use of the smallest transistors in the cells, and leakage and variation in sub-100 nm CMOS technologies, SRAMs have become a major yield limiter. SRAM is more vulnerable to process variations and V T mismatch than logic circuits since minimum size (or sub-groundrule, or groundrule “waivered”) devices are used in the cell, and there is no “averaging” effect as in logic data paths. As the cell size for 6T SRAM is scaled down to between 0.5 μm 2 to 0.75 μm 2 at 65 nm node, and below 0.5 μm 2 at 45 nm node (Fig. 1) [1], the local random device V T mismatch (V T scattering) (Fig. 2) due to discrete dopant effect (Fig. 3(a)) [2, 3] further exasperates the problem to impose a fundamental limit on the attainable cell size. At 50 nm L eff (90/65 nm node), there are only about 200 dopant atoms in the device channel. At 12 nm L eff (32/28 nm node), the number of dopant atoms in the device channel is expected to reduce to merely 2.6 [2]. The “Sigma” () of V T variation has remained relatively flat for technology scaling down to around 130 nm node, yet it is rising quickly starting at 90 nm and is expected to increase by a factor of 4 to 30 nm node (Fig. 3(b)) [3]. This rapid increase in V T variation, against the backdrop of lower supply voltage, has rendered the redundancy needed for repair of the projected array size at 45 nm impractical, and even prohibitive for designs below “5-Sigma”, as a “4-Sigma” design requires 33 fixes per Mb of cells (0.57 fixes per Mb of cell for “5-Sigma” design). While it is possible and circuit techniques have been developed to monitor, detect, and compensate for the global systematic die-to-die variations, it is much more difficult to develop similar schemes for local random variations. In addition to leakage and variation, NBTI (Negative Bias Temperature Instability) [4, 5] has recently emerged as a limiting factor in the scaling of PMOS. At high negative bias and elevated temperature, the PMOS V T gradually drifts to become more negative, thus reducing PMOS current drive and affecting cell stability, margin, and V min (minimum operating voltage) of SRAM. This long term V T drift and other gate oxide related V T degradation mechanisms must be accounted for over the life span of usage. Furthermore, with high- gate materials, PBTI (Positive Bias Temperature Instability) in NMOS must be considered, as high- gates exhibit significant charge trapping and thus V T shift in NMOS [6, 7]. 2. Cell Stability and Disturb The cell stability, Static Noise Margin (SNM), and V min of 6T SRAM are limited by leakage, variation, and supply voltage in the physical domain. In the design domain, the major limiting factors are the conflicting Read/Write requirements and cell disturbs. Read/Write Requirements: To facilitate Read and minimize Read-disturb (Fig. 4), it is desirable to have a strong pull-down NMOS and a small ratio (strength ratio of access pass-transistor NMOS to pull-down NMOS). However, in order to improve Writability (Write margin) and Write performance, it is desirable to have a strong access pass-transistor NMOS. Read Disturb: Conventional 6T SRAM is most unstable (worst SNM) in Read operation (Fig. 4), since the access pass-transistor NMOS and the pull-down NMOS form a voltage divider, and the cell “0” storage node rises during Read (Read-disturb voltage). If the Read-disturb voltage is larger than the “Trip” voltage of the other half-cell inverter, the cell can flip. As shown in Fig. 4, with technology scaling, the variations in the Read-disturb voltage (“Cell down-level”) and inverter Trip voltage (“Cell switch-point”) increase, causing overlap and thus failure starting at 90 nm node [8]. Half-Select Disturb: During a Read or Write operation, the half-selected cells on the same word-line are actually experiencing a Read operation, and thus disturb similar to Read-disturb described above (Fig. 5). 978-1-4244-1656-1/07/$25.00 ©2007 IEEE 4

High-Performance SRAM in Nanoscale CMOS: Design Challenges ... · constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Performance SRAM in Nanoscale CMOS: Design Challenges ... · constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell

High-Performance SRAM in Nanoscale CMOS: Design Challenges and Techniques

Ching-Te Chuang, Saibal Mukhopadhyay, Jae-Joon Kim, Keunwoo Kim, and Rahul Rao IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, U. S. A.

Phone: +1-914-945-3596, Fax: +1-914-945-1358, e-mail: [email protected]

Abstract This paper reviews the design challenges and techniques of high-performance SRAM in the “End of Scaling” nanoscale CMOS technologies. The impacts of technology scaling, such as signal loss due to leakage, degradation of noise margin due to VT scatter caused by process variations and random dopant fluctuation, and long term reliability degradation such as NBTI, are addressed. Design directions and leakage/variation/degradation tolerant SRAM circuit techniques to mitigate various performance and reliability constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell isolation and strength preservation, thin cell layout, bit-line and word-line leakage mitigation, migration to large signal read-out, unclamped bit-line, dual-supply, dynamic Read/Write supply, floating power-line, header/footer power-gating structures, Read- and Write-assist circuits, leakage/variation detection and compensation techniques, word-line and bit-line pulsing schemes, gate leakage tolerant design, and NBTI tolerant design. Alternative cell structures, such as asymmetrical SRAM, 7T, and 8T SRAMs, which decouple the cell storage node from the Read-disturb and half-select disturb to improve the SNM are discussed. Finally, some design issues and opportunities in emerging technologies such as FD/SOI and multi-gate FinFET are illustrated.

1. Technology Scaling and SRAM Design Challenges High-performance SRAMs are crucial elements for high-performance microprocessor cache memory and SoC applications. Due to large number of transistor count, the use of the smallest transistors in the cells, and leakage and variation in sub-100 nm CMOS technologies, SRAMs have become a major yield limiter. SRAM is more vulnerable to process variations and VT mismatch than logic circuits since minimum size (or sub-groundrule, or groundrule “waivered”) devices are used in the cell, and there is no “averaging” effect as in logic data paths. As the cell size for 6T SRAM is scaled down to between 0.5 μm2 to 0.75 μm2 at 65 nm node, and below 0.5 μm2 at 45 nm node (Fig. 1) [1], the local random device VT mismatch (VT scattering) (Fig. 2) due to discrete dopant effect (Fig. 3(a)) [2, 3] further exasperates the problem to impose a fundamental limit on the attainable cell size. At 50 nm Leff (90/65 nm node), there are only about 200 dopant atoms in the device channel. At 12 nm Leff (32/28 nm node), the number of dopant atoms in the device channel is expected to reduce to merely 2.6 [2]. The “Sigma” (�) of VT variation has remained relatively flat for technology scaling down to around 130 nm node, yet it is rising quickly starting at 90 nm and is expected to increase by a factor of 4 to 30 nm node (Fig. 3(b)) [3]. This rapid increase in VT variation, against the backdrop of lower supply voltage, has rendered

the redundancy needed for repair of the projected array size at 45 nm impractical, and even prohibitive for designs below “5-Sigma”, as a “4-Sigma” design requires 33 fixes per Mb of cells (0.57 fixes per Mb of cell for “5-Sigma” design).

While it is possible and circuit techniques have been developed to monitor, detect, and compensate for the global systematic die-to-die variations, it is much more difficult to develop similar schemes for local random variations.

In addition to leakage and variation, NBTI (Negative Bias Temperature Instability) [4, 5] has recently emerged as a limiting factor in the scaling of PMOS. At high negative bias and elevated temperature, the PMOS VT gradually drifts to become more negative, thus reducing PMOS current drive and affecting cell stability, margin, and Vmin (minimum operating voltage) of SRAM. This long term VT drift and other gate oxide related VT degradation mechanisms must be accounted for over the life span of usage. Furthermore, with high-� gate materials, PBTI (Positive Bias Temperature Instability) in NMOS must be considered, as high-� gates exhibit significant charge trapping and thus VT shift in NMOS [6, 7].

2. Cell Stability and Disturb

The cell stability, Static Noise Margin (SNM), and Vmin of 6T SRAM are limited by leakage, variation, and supply voltage in the physical domain. In the design domain, the major limiting factors are the conflicting Read/Write requirements and cell disturbs.

Read/Write Requirements: To facilitate Read and minimize Read-disturb (Fig. 4), it is desirable to have a strong pull-down NMOS and a small � ratio (strength ratio of access pass-transistor NMOS to pull-down NMOS). However, in order to improve Writability (Write margin) and Write performance, it is desirable to have a strong access pass-transistor NMOS.

Read Disturb: Conventional 6T SRAM is most unstable (worst SNM) in Read operation (Fig. 4), since the access pass-transistor NMOS and the pull-down NMOS form a voltage divider, and the cell “0” storage node rises during Read (Read-disturb voltage). If the Read-disturb voltage is larger than the “Trip” voltage of the other half-cell inverter, the cell can flip. As shown in Fig. 4, with technology scaling, the variations in the Read-disturb voltage (“Cell down-level”) and inverter Trip voltage (“Cell switch-point”) increase, causing overlap and thus failure starting at 90 nm node [8].

Half-Select Disturb: During a Read or Write operation, the half-selected cells on the same word-line are actually experiencing a Read operation, and thus disturb similar to Read-disturb described above (Fig. 5).

978-1-4244-1656-1/07/$25.00 ©2007 IEEE 4

Page 2: High-Performance SRAM in Nanoscale CMOS: Design Challenges ... · constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell

3. Circuit and Design Techniques Many circuit and design techniques have been developed to mitigate the leakage and variation, to improve the cell stability, noise margin, Vmin, and to extend the scalability of 6T SRAM. The most notable ones are:

Thin-Cell Layout: Starting around 90 nm node, thin-cell layout with uni-directional poly (Fig. 6) have become prevalent across the entire industry [9-14]. The “thin” vertical height reduces bit-line loads for performance and noise immunity, and the uni-directional poly improves the manufacturability and yield.

Hierarchical Bit-Lines with Short Local Bit-Lines: The short local bit-lines reduce the bit-line load, charge injection into the cell, leakage, and noise coupling, thus improving the performance, power, and noise margin. The global bit- lines are typically reset to “Low”, thus further reducing the power [15-17].

Large Signal Domino-Like Sensing: Large signal read-out schemes with full swings at local and global bit-lines have replaced the traditional small signal differential sensing schemes to ensure stability, combat variability, improve margin, and address reduced cell current problem. In most cases, single-ended large signal Read and differential Write are employed [16, 17].

Unclamped (Floating) or Weakly Clamped Local Bit-Lines: The local bit-lines are left unclamped (floating) or weakly clamped to suppress or reduce the leakage from the reset (precharge) devices during the standby mode or for any unselected sub-arrays during active mode. This is achieved by turning off the precharge devices in the case of unclamped bit-lines [16, 17]. In the case of weakly clamped bit-lines, the bit-line precharged voltage level is lowered by, for example, one VT drop by using NMOS precharge devices in source-follower configuration (as opposed to full rail precharge using PMOS in the clamped bit-line case). Unclamping of bit-lines also reduces the half-select disturb since the voltage level at these bit-lines drift lower due to the current and leakage.

Dual Supplies: The supply voltage for cells and performance critical peripheral circuits (VCS) is set at a level 150mV – 200mV higher than the logic supply level (VDD) to ensure cell stability and improve performance, while less-critical peripheral circuits stay at VDD to reduce power consumption [15-17]. Fig. 7 illustrates the voltage domains of the 6.0 GHz SRAM array for CELL Broadband EngineTM [17] in 65 nm PD/SOI technology. The cell, word-line, bit-line precharge signals are at VCS (1.0 V) to improve cell stability, Read performance , and allow writeability to track cell stability. Local bit-lines, evaluation, and global write lines are at VDD (0.8V) to reduce cell stress and lower power. This dual supply scheme has helped to improve Vmin from 0.875 V to 0.80 V.

Adaptive Read/Write Supply: The cell supply voltage can be adaptively/dynamically switched based on Read, Write, or Standby. Fig. 8 depicts a dynamic supply scheme using column based header switches [18]. High cell supply voltage is used for Read (VCC,Cell > VWL), while lower cell voltage (200 mV lower) is used for Write (VCC,Cell < VWL). The scheme improves noise margin for both Read and Write while minimizing leakage, and significantly reduces the

random single bit failure in a 3.0 GHz, 70 Mb SRAM in 65 nm technology.

Proper combination of adaptive Read/Write supply and cell body bias improves the Read and Write margin, and has been shown to enable 30% power reduction and 0.3 V Vmin for a 64 Kb low-power SoC SRAM in 90 nm CMOS technology [19].

One variation of adaptive Read/Write supply scheme is the “Floating Power-line Write” where the cell supply voltage for the selected column is switched off by column power switch, thus putting the cell supply node in a floating state. In this scheme, the Write current lowers the cell supply node for the selected column, thus facilitating Write, reducing Write power, and improving Vmin by 100 mV in 32Kb/512Kb SRAMs in 90 nm CMOS [20].

Capacitive coupling technique has also been used to bootstrap cell supply voltage to above VDD during access to achieve higher cell current and improved stability in a pico-Joule class, 1 GHz, 32 KByte x 64b DSP SRAM [21].

Power-Gating with Header/Footer: Power-gating structures are used to reduce the leakage in Sleep/Standby mode or Shut-off mode [21, 22]. Fig. 9 illustrates the scheme used in the 3.4 GHz, 16MB L3 cache in dual-core Xeon processor in 65 nm CMOS technology [22], where NMOS sleep transistor is used in the sub-arrays and PMOS power-gating is used in the periphery.

Read-Assist and Write-Assist Circuits: The variation detection and compensation technique is best illustrated in the scheme depicted in Fig. 10 [23]. In this scheme, a process and temperature variation tolerant Read-Assist Circuit (RAC) utilizes Replica Access Transistors (RATs) as variation sensing devices. The RATs have topologically the same layout structure as SRAM access transistors. The selected WL level inversely tracks strength of access transistors, thus compensating the access transistor strength variation.

Capacitive coupling technique has also been employed to facilitate Write. Fig. 11 shows a Write-Assist Circuit (W-AC) with divided array supply scheme [23]. In this scheme, the array supply and a dummy supply (at GND level in Standby or Read) in the selected column are put into floating state during Write. The array supply is then shorted to the dummy supply by an NMOS switch, causing the array supply to be capacitively coupled down to facilitate Write. The capacitive coupling effect can be enhanced by dividing the array supply line into short segments. The capacitive coupling effect provides a fast, large response, and there is no need for additional supply. The scheme enhances Write margin against increasing variation due to scaling, resulting in significantly higher yield even with a much smaller cell in a 4 x 256 Kb SRAM macros in 45 nm LSTP CMOS technology

Word-Line and Bit-Line Pulsing: Proper control of word-line pulse width and temporarily lower bit-line voltage are also effective techniques to improve cell stability [24]. The word-line pulse width must be short enough to limit the Read-disturb; while long enough to ensure adequate �VBL for sensing during Read, and long enough to maintain enough Write margin. The bit-line voltage is pulled-down by 100 – 300 mV just before word-line turns on to reduce

5

Page 3: High-Performance SRAM in Nanoscale CMOS: Design Challenges ... · constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell

Read-disturb during Read, and half-select disturb during Write.

Gate Leakage Tolerant Design: The gate leakage of large devices in word-line driver can be reduced by collapsing the voltage across the gate oxide. In [25], the voltage level at the gate of the word-line driver NMOS device is left floating, and discharged by junction leakage in standby, thus reducing gate leakage.

NBTI Tolerant Design: The bias dependence of NBTI can be exploited to minimize the impact on long term reliability. Fig. 12 shows a NBTI tolerant sense amplifier design for L1 cache and TLB in a 64b microprocessor in 130 nm technology [5]. In the original circuit (Fig. 12(a)), The VT shift over silicon lifetime due to NBTI causes VT mismatch of as high as 50 mV between pMOS M3 and M4, resulting in sense delay degradation of 42%. Noting that the NBTI induced VT shift depends strongly on the gate-source bias and temperature but barely on the drain voltage, the new circuit (Fig. 12(b)) replaces the cross-coupled M3/M4 by T3/T4, for which gates are commonly biased at about 40% VDD during sensing. As the gate bias is identical for T3 and T4, the VT mismatch is minimized. pMOS T1/T2 are added to speed up the output low-to-high transition. Their VT mismatch is not critical since they get activated after the critical part of the amplification. The new circuit improves the deteriorated sense delay by 22%.

4. 8T SRAMs 8T SRAM has recently emerged as a promising candidate to replace 6T SRAM in sub-32 nm technologies [26-29]. The 8T SRAM utilizes a cell schematically similar to cells commonly used in register files, by separating the Read word-line and Write word-line and adding 2 stacked NMOS for single-ended Read (Fig. 13). If offers the following advantages:

Read Disturb: By separating the Read word-line and Write word-line and isolating the cell storage node from Read current path, the Read disturb is completely eliminated.

Half-Select Disturb: Half-select disturb still exists during Write operation for 8T SRAM. The half-select disturb can be eliminated by:

(1) Array architecture approach: The array can be floor-planed such that all bits in a word are spatially adjacent, and thus requiring no column select. The constraint for this approach is that bits from different words can not be physically interleaved. This approach has been used in a 5.3 GHz, 0.41 V Vmin, 32Kb sub-array in 65 nm PD/SOI technology [28].

(2) Gated Write word-line signal (Byte Write): The local Write word-line select signal is gated, and “on” only for the selected block. This scheme has been implemented in a 6.6+ GHz, 0.4 V Vmin, dual-supply, 1.2 Mb SRAM in 65 nm PD/SOI technology [29].

(3) Write-back scheme: The Read word-line is activated even during Write, and all cell data in selected word-line are read out to D-latches. The Data-out is then written back to half-selected cells. The scheme has been used in a 0.42 V Vmin, 64 Mb SRAM in 90 nm CMOS technology [27].

Cell Size Scaling: At 90 nm node, the cell size of 8T SRAM is bout 20% larger than 6T SRAM cell. However, in 6T cell, the device width of cell pull-down NMOS cannot

be scaled down without degrading the Read performance and Read disturb, while that in the 8T cell can. At 45 nm node, the cell areas cross if the operating voltage is 1.0 V, and the area of the 8T cell becomes smaller at the 32 nm node. If the operating voltage is 0.8 V, the 6T cell area is saturated and cannot be shrunk beyond 45 nm due to the large width of the cell pull-down NMOS needed for Read performance and margin. Furthermore, in 8T cell, high voltage can be used for Read word-line to enhance Read performance without causing Read disturb; and high voltage can be used for Write word-line to facilitate Write and expand Write margin. These voltage control schemes further enable the scalability of 8T SRAM cell as shown in Fig. 13 [27]. 5. Asymmetrical SRAMs

The Read-disturb in 6T SRAM can be mitigated/eliminated by either cutting off the inverter feedback path or skewing one of the cell inverter to have a higher Trip voltage, while employing single-ended Read with the other cell inverter.

7T Asymmetrical SRAM: Fig. 14(a) shows the schematics of a 7T asymmetrical SRAM cell [30]. In this scheme, the Read word-line is separated from the Write word-line, and single-ended Read is employed. An extra NMOS is inserted in the left-side cell inverter, which cuts off the feedback loop during Read, thus eliminating the failure due to Read-disturb. The scheme has been implemented in a 64 Kb SRAM in 90 nm CMOS technology and demonstrated 20 ns access at 0.5V. A Vmin of 0.44 V (determined by Write SNM) has been achieved with 9% area overhead.

6T Asymmetrical SRAM: Fig. 14(b) shows the schematics of a 6T asymmetrical SRAM cell [31]. Again, the Read word-line is separated from the Write word-line, and single-ended Read is employed. The left-side cell pull-down NMOS is weakened, thus increasing the Trip voltage of the left-side half-cell inverter to improve the Read margin. Weakening the left-side pull-down NMOS by sizing down the device width alone is limited by the minimum width in a given technology, and normally does not provide sufficient improvement in Read SNM. High-VT device, thick-oxide device, or mid-gap gate material (which increases VT) can be used in combination with device down-sizing to further improve the Read margin [31]. 6. Design Considerations for Emerging Devices

In this section, we discuss the design issues/considerations and some novel schemes exploiting the unique features of emerging devices such as fully-depleted SOI (FD/SOI) devices and multi-gate/FinFET devices.

6.1. Fully-Depleted SOI (FD/SOI) Devices

Due to better short-channel control, very low junction capacitance, and the elimination of hysteric VT variation, Ultra-Thin Body Fully-Depleted SOI (UTB FD/SOI) device has emerged as a possible candidate for sub-65 nm technologies.

Random Dopant Fluctuation (RDF) Effect: The local VT variation due to RDF is expected to be more serious in FD/SOI devices compared with bulk-CMOS or PD/SOI devices. This is primarily due to the stronger dependence of bulk change (QB) on the body doping density (Nb) in FD/SOI device (QB � Nb) compared with bulk-CMOS or PD/SOI device (QB � �Nb). The RDF effect can be reduced

6

Page 4: High-Performance SRAM in Nanoscale CMOS: Design Challenges ... · constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell

by lowering the body-doping. However, reduction of body-doping increases the leakage current. Proper tuning of back-gate bias or gate workfunction, combined with body-doping reduction, has been shown to be very effective in achieving a given leakage target for robust SRAM applications [32, 33].

Asymmetrical VT Shift: Fig. 15 shows the distribution of I-V characteristics due to RDF effect for PD/SOI and FD/SOI devices. For FD/SOI device, the mean VT is lower than the nominal VT expected from continuous doping case. This asymmetry in VT shift causes a large spread in leakage, and degrades the performance and margin as the silicon film thickness is reduced (Fig. 16). This asymmetrical VT shift originates from the penetration of drain fringing electric field underneath the channel, which affects the barrier at the source for carrier injection and degrades the Short Channel Effect (SCE). The source carrier injection (hence channel current) depends exponentially on the barrier height, and thus highly non-linear and highly asymmetrical. This effect can be mitigated by using thin Buried Oxide (BOX) structure, which reduces the penetration of drain fringing electric field, thus reducing the asymmetry and leakage spread, at small expense of performance and Write margin [32, 33].

6.2. Double-Gate/FinFET Devices

Quasi-planar multi-gate devices, such as FinFET, hold the promise to extend scaling beyond conventional planar CMOS technologies. The device structure imposes some constraints while, at the same time, offers unique features that can be exploited for SRAM applications [34-37].

Fin Height (Device Width) Quantization: The quantization of device width to integer multiple of fin height constrains the opportunity of cell transistor sizing for stability optimization in FinFET SRAM.

Surface Orientation and Mobility: In FinFET devices, different surface orientation can be realized by rotating the fin in layout (Fig. 17 and 18(b)). The orientation dependence of mobility can be exploited to optimize the �-ratio of cell transistors to improve the performance and margin. As shown in Fig. 18(a), with optimized orientation, the Read SNM can be improved by 23-35%, and access time by ~22-33%, at expense of containable Write margin loss and area [34].

Independent Gate Feedback Schemes: The independent-gate capability in double-gate devices can be exploited to improve performance and margin. Fig. 19(a) shows a 6T SRAM cell with “Yin-Yang” feedback with double-gate devices [35]. The front-gate and back-gate of the cell pull-down NMOS devices are tied together. As such, the “On” NMOS becomes stronger, and the “Off” NMOS becomes weaker, thus reducing the Read-disturb to improve the SNM. An improved version of the “Yin-Yang” feedback cell is shown in Fig. 19(b) [36], where the back-gates of the access pass-transistor NMOS’s are connected to the storage nodes. This scheme has the same Read characteristic as Yin-Yang feedback cell, yet improved Write performance and margin as the cell “High” side access pass-transistor becomes stronger.

Back-Gated Asymmetrical SRAM: The back-gating capability of asymmetrical double-gate device can be exploited for asymmetrical SRAM applications. Fig. 20(a)

illustrates an example where a down-sized front-gate is used for the left-side pull-down NMOS while the back-gate is biased to GND. The scheme offers 1.9X SNM improvement over symmetrical cell and 1.5X improvement over asymmetrical cell with device sizing only (Fig. 20(b)) [37].

7. Conclusions

We have discussed CMOS technology scaling and resulting leakage/variation/reliability design challenges for high-performance SRAMs. Cell stability, Read/Write requirements, Read-disturb, and Half-select disturb are shown to impose constraints on performance, margin and cell scalability. Examples of circuit techniques to mitigate various power, performance, and reliability constraints are given. Alternative cell structures, such as 8T SRAM and asymmetrical SRAM, and their merits are discussed. Design considerations for emerging devices, such as FD/SOI and double-gate/FinFET are addressed, and novel circuit schemes exploiting unique features of emerging devices are discussed.

Acknowledgement

The authors were partially supported by DARPA Contract HR0011-07-9-0002.

References [1] http://www.itrs.net/ [2] D. J. Frank et al., “Monte Carlo Modeling of Threshold Variation due to Dopant Fluctuations,” Digest of Tech. Papers, Symp. VLSI Technology, 1999, pp. 169-170. [3] S. Samaan, “The Impact of Device Parameter Variations on the Frequency and Performance of Microprocessor Circuits,” ISSCC Microprocessor Circuit Design Forum on Managing Variability on sub-100nm Designs, Feb. 19, 2004. [4] J.C. Lin, A.S. Oates, H.C. Tseng, Y.P. Liao, T.H. Chung, K.C. Huang, P. Y. Tong, S.H. Yau, and Y.F. Wang, “Prediction and Control of NBTI – Induced SRAM Vccmin Drift,” Tech. Digest, IEDM, pp. 345-348. [5] T. Takayanagi et al., “A Dual-Core 64b UltraSPARC Microprocessor for Dense Server Applications,” Digest of Tech. Papers, ISSCC, 2004, pp. 58-59. [6] S. Zafar, et. al, "A Comparative Study of NBTI and PBTI (Charge Trapping) in SiO2/HfO2 Stacks with FUSI, TiN, Re Gates," Dig. Tech. Papers, Symp. VLSI Technology, 2006, pp. 30-31. [7] J. C. Lin, A.S. Oates, and C.H. Yu, “Time Dependent Vccmin Degradation of SRAM Fabricated with High-k Gate Dielectrics,” Proc. International Reliability Physics Symposium, 2007, pp. 439-444. [8] H. Pilo, “SRAM Design in the Nanoscale Era,” Digest of Tech. Papers, ISSCC, SE5, 2005, pp. 366-367. [9] M. Iwai et al., “45nm CMOS Platform Technology (CMOS6) with High Density Embedded Memories,” Digest of Tech. Papers, Symp. VLSI Technology, 2004, pp. 12-13. [10] F. L. Yang et al., “45nm Node Planar-SOI Technology with 0.296μm2 6T-SRAM Cell,” Digest of Tech. Papers, Symp. VLSI Technology, 2004, pp. 8-9. [11] F. Arnaud et al., “Low Cost 65nm CMOS Platform for Low Power & General Purpose Applications,” Digest of Tech. Papers, Symp. VLSI Technology, 2004, pp. 10-11. [12] S. Zhao, et al., Transistor Optimization for Leakage Power Management in a 65nm CMOS Technology for Wireless and Mobile Applications, Digest of Tech. Papers, Symp. VLSI Technology, 2004, pp. 14-15. [13] K. Zhang et al., “A 3-GHz 70Mb SRAM in 65nm CMOS Technology with Integrated Column-Based Dynamic Power Supply,” Digest of Tech. Papers, ISSCC, 2005, pp. 474-475.

7

Page 5: High-Performance SRAM in Nanoscale CMOS: Design Challenges ... · constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell

[14] E. Leobandung et al., “High Performance 65 nm SOI Technology with Dual Stress Liner and low capacitance SRAM cell,” Digest of Tech. Papers, Symp. VLSI Technology, 2005, pp. 126-127. [15] M. Khellah et al., “A 4.2GHz 0.3mm2 256kb Dual-Vcc SRAM Building Block in 65nm CMOS,” Digest of Tech. Papers, ISSCC, 2006, pp. 624-625. [16] J. Davis et al., “A 5.6GHz 64kB Dual-Read Data Cache for the POWER6TM Processor,” Digest of Tech. Papers, ISSCC, 2006, pp. 622-623. [17] J. Pille et al., “Implementation of the CELL Broadband EngineTM in a 65nm SOI Technology Featuring Dual-Supply SRAM Arrays Supporting 6GHz at 1.3V,” Digest of Tech. Papers, ISSCC, 2007, pp. 322-323. [18] K. Zhang et al., “A 3-GHz 70Mb SRAM in 65nm CMOS Technology with Integrated Column-Based Dynamic Power Supply,” Digest of Tech. Papers, ISSCC, 2005, pp. 474-475. [19] Y. Morita et al., “A Vth-Variation-Tolerant SRAM with 0.3-V Minimum Operation Voltage for Memory-Rich SoC under DVS Environment,” Dig. Tech. Papers, Symp. VLSI Circuits, 2006, pp. 16-17. [20] M. Yamaoka et al., “Low-Power Embedded SRAM Modules with Expanded Margins for Writing,” Digest of Tech. Papers, ISSCC, 2005, pp. 480-481. [21] Azeeze et al., “A Pico-Joule Class, 1 GHz, 32 KByte x 64b DSP SRAM with Self Reverse Bias,” Digest of Tech. Papers, Symp. VLSI Circuits, 2003, pp. 251-252. [22] S. Rusu et al., “A Dual-Core Multi-Threaded Xeon® Processor with 16MB L3 Cache,” Digest of Tech. Papers, ISSCC, 2006, pp. 102-103. [23] M. Yabuuch et al., “A 45nm Low-Standby-Power Embedded SRAM with Improved Immunity Against Process and Temperature Variations,” Digest of Tech. Papers, ISSCC, 2007, pp. 326-327. [24] M. Khellah, Y. Ye, N. S. Kim, D. Somasekhar, G. Pandya, A. Farhang, K. Zhang, C. Webb, V. De, “Wordline and Bitline Pulsing Schemes for Improving SRAM Cell Stability in Low-Vcc 65nm CMOS Designs,” Dig. Tech. Papers, Symp. VLSI Circuits, 2006, pp. 12-13. [25] K. Nii et al., “A 90 nm Low Power 32K-Byte Embedded SRAM with Gate Leakage Suppression Circuit for Mobile Applications,” Digest of Tech. Papers, Symp. VLSI Circuits, 2003, pp. 247-250. [26] L. Chang et al., “Stable SRAM Cell Design for the 32 nm Node and Beyond,” Digest of Tech. Papers, Symp VLSI Technology, 2005, pp. 128-129. [27] Y. Morita et al., “An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment,” Digest of Tech. Papers, Symp. VLSI Circuits, 2007, pp. 256-257. [28] L. Chang et al., “A 5.3GHz 8T-SRAM with Operation Down to 0.41V in 65nm CMOS,” Digest of Tech. Papers, Symp. VLSI Circuits, 2007, pp. 252-253. [29] R. Joshi et al., “6.6+ GHz Low Vmin, read and half select disturb-free 1.2 Mb SRAM,” Digest of Tech. Papers, Symp. VLSI Circuits, 2007, pp. 250-251. [30] K. Takeda et al., “A Read-Static-Noise-Margin-Free SRAM Cell for Low-Vdd and High-Speed Applications,” Digest of Tech. Papers, ISSCC, 2005, pp. 478-479. [31] Keunwoo Kim, Jae-Joon Kim, and Ching-Te Chuang, “Asymmetrical SRAM Cells with Enhanced Read and Write Margins,” Proc. 2007 International Symp. on VLSI Technology, Systems, and Applications (VLSI-TSA), Hsinchu, Taiwan, April 23-25, 2007, pp. 162-163. [32] S. Mukhopadhyay, K. Kim, X. Wang, D. J. Frank, P. Oldiges, C. T. Chuang., and K.. Roy, “Optimal UTB FD/SOI Device Structure using Thin-BOX for sub-50-nm SRAM Design,” IEEE Electron Device Letters, vol. 27, no. 4, April 2006, pp. 284-287. [33] Saibal Mukhopadhyay, Keunwoo Kim, and Ching-Te Chuang, “Analysis and Optimization of Thin-BOX FD/SOI Devices for Low-Power and Stable SRAM in sub-50nm Technologies," Proc. 2007 International Symposium on Low

Power Electronics and Design (ISLPED2007), Portland, Oregon , August 27-29, 2007, pp. 20-25. [34] S. Gangwal et al, “Optimization of Surface Orientation for High-Performance, Low-Power and Robust FinFET SRAM,” Proc. IEEE Custom Integrated Circuits Conf. (CICC), 2006, pp. 433-436. [35] M. Yamaoka et al., “Low Power SRAM Menu for SOC Application Using Yin-Yang-Feedback Memory Cell Technology,” Digest of Tech. Papers, Symp. VLSI Circuits, 2004, pp. 288-291. [36] Z. Guo et al., “FinFET-Based SRAM Design”, Proc. International Symp. on Low Power Electronics and Design (ISLPED), 2005, pp.2-7. [37] J. J. Kim, K. Kim, and C. T. Chuang, “Independent-Gate Controlled Asymmetrical SRAM Cells in Double-Gate MOSFET Technology for Improved READ Stability,” Proc. European Solid-State Circuits Conference (ESSCIRC), 2006, pp. 74-77.

Fig. 1. 2005 ITRS Product Function Size Trends [1].

Fig. 2. Global systematic and local random variation in device threshold voltage degrade yield of SRAM design.

(a) (b)

Fig. 3. (a) Number of dopant atoms in the channel as function of effect channel length [2], and (b) relative “Sigma” of VT variation as function of technology node [3].

p0 p1 p2

pn

pi

p0 p1 p2

pn

pi

p0

p1pn

p2

pi

p0

p1pn

p2

pi

VDD

BL BR

WL

δvtj

δvti

VDD

BL BR

WL

VDD

BL BR

WL

δvtj

δvti

δδδδVti δδδδVtjδδδδVti δδδδVtj

Global systematic variation, die-to-die Local device mismatch

8

Page 6: High-Performance SRAM in Nanoscale CMOS: Design Challenges ... · constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell

Fig. 4. Scaling and cell stability margin of 6T SRAM [8]. Fig. 5. Half-select disturb (illustration for Write operation).

Fig. 6. “Thin Cell” layout to improve performance, noise immunity, manufacturability, and yield [9-14].

Fig. 7. Example of SRAM with dual-supply, hierarchical short local bitline, large signal single-ended “ripple-domino” Read, and differential Write [17].

Fig. 8. Column based header switch dynamic supply scheme to optimize Read/Write performance [18].

Fig. 9. Power-gating with footer switch [22].

Iread

BL BLBR BR

WL0

WL1

COL0 COL1

‘1’ ‘1’ ‘1’ ‘0’

‘0’

‘1’

Selected for WRITE

Disturb failure can occur

BL BLBR BR

WL0

WL1

COL0 COL1

‘1’ ‘1’ ‘1’ ‘0’

‘0’

‘1’

Selected for WRITE

Disturb failure can occur

9

Page 7: High-Performance SRAM in Nanoscale CMOS: Design Challenges ... · constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell

Fig. 10. Process and temperature variation tolerant Read-Assist Circuits (RAC) [23].

Fig. 11. Capacitive Write-Assist Circuit (WAC) with divided array supply scheme [23].

Fig. 12. NBTI tolerant TLB sense amplifier circuit [5].

Fig. 13. Cell size scaling of 6T and 8T SRAM [27].

(a) (b) Fig. 14. (a) Asymmetrical 7T SRAM cell [30], and (b) asymmetrical 6T SRAM cell [31].

(a) (b) Fig. 15. Random Dopant Fluctuation (RDF) effect in (a) PD/SOI, and (b) FD/SOI device [32, 33].

R/WWLWWL

WBL R/WBL

ARALPL PR

NRQ

NL

Qb

Make the two sides unbalanced

Trip voltage of PL-NL pair goes up

R/WWLWWL

WBL R/WBL

ARALPL PR

NRQ

NL

Qb

Make the two sides unbalanced

R/WWLWWL

WBL R/WBL

ARALPL PR

NRQ

NL

Qb

Make the two sides unbalanced

Trip voltage of PL-NL pair goes up

bulk BODY dm BODYQ N W Nα αα αα αα α bulk BODY si BODYQ N T Nα αα αα αα α

10

Page 8: High-Performance SRAM in Nanoscale CMOS: Design Challenges ... · constraints in conventional planar CMOS technology are discussed. Examples are given and merits discussed for cell

(a) (b)

Fig. 16. SRAM in FD/SOI: (a) leakage vs. Si film thickness, and (b) statistical distribution of Read-disturb voltage and cell inverter Trip voltage due to RDF effect [32, 33].

Fig. 17. Modification of surface orientation in FinFET [34]. (a) (b) Fig. 18. Optimized multi-oriented FinFET 6T SRAM cells: (a) Access Time vs. Read SNM, and (b) (100) PUP, (110) AX, (100) PD on (a) (100) wafer [34].

(a) (b) Fig. 19. Double-gate SRAM cells: (a) Yin-Yang feedback cell [35], and (b) improved Yin-Yang feedback cell [36]. (a) (b) Fig. 20. (a) Back-gated asymmetrical 6T SRAM cell in asymmetrical double-gate technology, and (b) Read SNM compared with symmetric 6T SRAM cell [37].

Qb=VDD Q=GND

XRXLPL PR

NL NR

Qb=VDD Q=GND

XRXLPL PR

NL NR

VDD Q=GND

XRXLPL PR

NL NR

front feedback(yang)

back feedback(yin)

VDD Q=GND

XRXLPL PR

NL NR

front feedback(yang)

back feedback(yin)

R/WWLWWL

WBL R/WBL

ARALPL PR

NRQ

NL (FG)

Qb

R/WWLWWL

WBL R/WBL

ARALPL PR

NRQ

NL (FG)

Qb

0.0 0.2 0.4 0.6 0.8 1.0Q (V)

0.0

0.2

0.4

0.6

0.8

1.0

Qb

(V)

300 mV270 mV

320 mV

300 mV

160 mV

0.0 0.2 0.4 0.6 0.8 1.0Q (V)

0.0

0.2

0.4

0.6

0.8

1.0

Qb

(V) Asym. w/ NL=0.25um

Sym. 6T

Asym. w/ NL=0.1um

0.0 0.2 0.4 0.6 0.8 1.0Q (V)

0.0

0.2

0.4

0.6

0.8

1.0

Qb

(V)

300 mV270 mV

320 mV

300 mV

160 mV

0.0 0.2 0.4 0.6 0.8 1.0Q (V)

0.0

0.2

0.4

0.6

0.8

1.0

Qb

(V) Asym. w/ NL=0.25um

Sym. 6T

Asym. w/ NL=0.1um

11