[IEEE TENCON 2009 - 2009 IEEE Region 10 Conference - Singapore (2009.01.23-2009.01.26)] TENCON 2009 - 2009 IEEE Region 10 Conference - Rhythmic codebook of 300mV precharge, 1ns, low

978-1-4244-4547-9/09/$26.00 ©2009 IEEE TENCON 2009

le

Rhythmic Codebook of 300mV Precharge, 1ns, Low Power SRAM in Vector Quantizers

Mukul Sutaone Electronics and Telecommunication

College of Engineering, Pune Pune, India

[email protected]

Madan Mali Electronics and Telecommunication

College of Engineering, Pune Pune, India

[email protected]

Abstract—The effective design of semiconductor memory pertaining to the power consumption, speed and area penalty has always been the crucial task in embedded computing applications. The work presented in this paper is exact and innovative mathematical model based implementation of 32kb SRAM optimized for power and speed. The model has been developed for a cell, array, and pre-charge, I/Os and periphery devices for their exact behavior and then effective design is obtained by running the model through computing engine. The supply and pre-charge to an array of SRAM are swept and optimized combination is found out for minimum power dissipation and highest achievable access time. The SRAM array rows are controlled by the Gating Transistor Power Saving Technique (GTPST). Redundant columns have been found to make the memory fault tolerant. Similarly the the bitline passive leakage sensing and compensation scheme also has been presented. The experimental result shows 0.25μW dissipation at VDD of 620mV and pre-charge of 300mV. The minimum attainable bit line swing is 200μV/ns at VDD of 620mV and pre-charge of 500mV, both of which are state-of-art of its kind. The power saving of 13% is reported. The design by mathematical model, schematic and layout of 32Kb memory chip and simulation are carried out for development of codebook memory that finds application in embedded signal processing.

I. INTRODUCTION HE effective and optimized modeling and implementation of SRAM is always been the challenging task when power dissipation constraints are crucial with low access time.

The need is becoming vital specifically in embedded signal and image processing applications where the size of memory required is large with minimum signal processing time. Development of code book memory in vector quantizers is one of the intense applications. According to Semiconductor Industry Association and ITRS 2001, the relative silicon area occupied by embedded memories will approach 94% by 2014 [1]. Such memories should be immune by fault tolerant mechanism and reliable design [2]. The codebooks implemented by SRAM will be power efficient, robust and have capability to integrate in logic process [3, 4].

Figure 1. Structure of SRAM cell with GTPST Scheme Fig.1. shows the proposed SRAM cell with GTPST structure which consists of cross coupled inverters along with the access transistors. The dominant power dissipation in SRAM takes place due to charging and discharging of huge capacitive loads along the long bit lines. Sense amplifying memory cell giving 90% power saving has been developed [3]. Similarly an attempt is made to develop dual VCC SRAM building block with actively clamped sleep transistors [5]. Low voltage operation of SRAM is vital in embedded memory applications. Low voltage caches using 8T-SRAM cell have been developed [6]. Making SRAM fault tolerant is important to improve reliability. A model for power consumption of fault tolerant busses is presented in [7]. The innovative efforts are taken to minimize the bitline leakages in the bitline leakage compensation scheme for low voltage SRAMs [14]. Since the fast operation needs low threshold and low power requires high high threshold, the technique developed by these authors helps to reduce power drastically. Similarly low power based charge recycling has been tried at 1.5V [15]. The ultra low voltage computing 480 Kb SRAM at 0.2 V has been experimented [16]. The generalized techniques to reduce the power consumption have been surveyed [17]. The innovative math model based design of SRAM cell, array, I/Os, GTPST power saving scheme with redundancy based fault tolerant mechanism is implemented and proposed in this paper. Bit line

T

1

voltage swing is the major source of dissipation in cell. This swing is minimized by effective design of sense amplifier and optimized pre-charge. The distributed capacitive loading on bit lines is tackled by synchronized switching of the logical elements in an array. GTPST technique is introduced for least dissipation in each row. The reduction in dissipation and access time has become possible due to the finite and incremental model accuracy.

The organization of the paper is as follows. Section II describes the math model comprising parasitics of active and passive devices. The efficient power saving technique is explored in Section III. The redundancy based fault tolerant mechanism is explained in Section IV. Section V and VI are results and conclusion respectively.

II. MATHEMATICAL MODEL OF AN ARRAY

The model has been prepared for a cell to result in the capabilities of write, read and pre-charge. The technological MOS parameters such as device length, widths, parasitics, timings, loads etc are the inputs to the model. The individual model is simulated and meets the parameter values. The brief generalized expressions are stated here. The overall bit line capacitive load is given by:

( )( )

wbl

l

ioniop

oxtntp

ox

M

apcox

blw CdC

WW

CWW

LCCLWC

C +

⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

+

++

++

= ∫

∫

0

0

32

32

(1)

where l is Length of the bit line of an array, Cox is Silicon

oxide capacitance per unit area, W is Widths of the pre-charge, transmission and I/O devices, Cwbl is overall bit line wire capacitance, M is Number of cells in an array and L is the Channel length of device. The write access time is given by

( )[ ] blwblnaw CdRRRRT ∫ ++= (2)

where R is the Access, pull down and bit line device

resistance. Similarly energy consumed per cycle is

20

232

dd

M

aox

wc VdCLWC

E∫ ⎥⎦

⎤⎢⎣⎡

= (3)

And the area consumed on chip is

( )[ ]LdWWWWA apn∫ ++= (4)

The transistor dimensions, bit line, other wire pitch and

lengths, loads on the nodes are computed by the detail mathematical model and the corresponding technology model

Figure 2. Single column of memory array files. The resulting area, power and timings for each cell as well as array meet the estimated values. An individual one bit cell and an overall array are shown in fig. 2. The transmission logic controls direction of data flow while writing, reading, pre-charging and amplifying controlled by sense amplifier. The mathematical model developed is unique in its sense and has not been reported in standard publications.

In-Out Bidirectional Data Logic

Data

Transmission logic

2

Additionally, this model works precisely when the read operations are repetitive and pertaining to the specified group of locations as in case of systematic code book search in image compression done by vector quantizers.

III. EFFICIENT POWER SAVING TECHNIQUE Since the major source of the dissipation in the SRAM array

is charge and discharge of the huge capacitive loads along the long bit lines, the bitline leakages are taken care by the compensator. The detail schematic is shown in fig.3. The MOSFET M6, M7 pump the charge in the bit lines which is lossed while read operation. The amount of charge lossed while read is sensed first and then pumped as the positive feedback. It is found in the simulation that the charge recovery is far better than that of the array without compensation. Similarly The gating transistors along with the cells reduce the power consumption too. This compensator has been experimented with it too. As mentioned in the math model, these capacitive loads are contributed by active devices along the entire array, vertically as well as horizontally routed bit and word lines. Since the code book search in vector quantizers is rhythmic and systematic, the grouping of the cells and predicting the dissipation during search becomes vital. A Gating NMOS transistor is attached to a group of 8 cells named as Gating Transistor Power Saving Technique (GTPST). This transistor is enabled during read cycle but not while write. The dominant energy saving, hence, takes place during write cycle. The device geometry of this transistor is chosen so that the speed penalty is minimum. The logic recovery takes place in individual cell when the access transistors are ON. Since the read operation iterations happen in large number in code book memory, more focus is given on pre-charge levels of bit lines and voltage swing along the bit lines. Gating transistor is not attached to every cell to avoid large area penalty.

Vertically routed bit lines are always hungry between the successive read cycles due to their huge capacitive loads. These capacitors are pre-charged at finite voltage levels of Vddpc. This pre-charge voltage is swept from 300mV to 2.5V in order to find minimum bit line swing and its effect on the access time is recorded.

IV. REDUNDANCY BASED FAULT TOLERANT MECHANISM

Since the embedded memories occupy larger area on the chip; it is becoming very essential to make the memory fault tolerant. To achieve fault tolerance, redundant elements are incorporated into the memory. These elements are used to replace the faulty rows and columns in the memory. As compared to 1D redundancy, 2D is efficient. The redundancy technique can be made more efficient by using the external repair most algorithm [2]. The redundant rows and columns are incorporated into the memory array. When a faulty row or column is detected, it is recorded and a redundant row or a redundant column is used to replace it using the reconfiguration mechanism. In order to improve the efficiency of repairing embedded memories all of the memory rows and columns are divided into number of blocks. To each block,

redundant elements are assigned. In this technique, the redundant elements do not increase area overhead more than 0.10%. This redundancy structure results in higher repair rates.

V. RESULTS Experimentation is carried out using tools such as SPICE

for Schematic and Simulation, whereas MAGIC is used for Lay-out. Fig 2 shows single column of the array. It consists of the pre-charge, SRAM cell, transmission gate, sense amplifier and the IOs. Pre-charge circuit is used to pre-charge the bit lines for read operation. The sense amplifier is used to sense the difference on the bit lines and to amplify it.

On the Input Output circuits, the data to be written can be applied as well as the data after the read operation can be made available. Fig. 4 shows the simulation results of the complete column structure. In the first cycle the data is written on to the cells and then for consecutive 4 cycles the data is read from the cells. On the X-axis, time in ns and on Y-axis, voltage is plotted. Fig 4 shows the layout of a cell. It consists of the access transistors to which the word line signal is applied. On the upper side load transistors are present and driver transistors are available with bit lines.

Figure 3. Compensator for the Bitlines

3

Figure 4. Simulation waveforms of the column structure

Figure 5. The simulation results of memory array

Figure 6. Graph of Power Dissipation in uW related to Vdd and Vddpc

Fig 5 shows the simulation waveform of the memory array. The minimum access time observed in simulation is 1ns which gives rise to the maximum speed of 1GHz.

Fig. 6 shows the graph of the power dissipation related to the supply voltage Vdd and the pre-charge voltage VddPC. When the pre-charge voltage is maximum equal to the supply voltage, the power dissipated is 1065μW. As the pre-charge

voltage is decreased further the power reduces to 257μW. For further reduction in the power, when the supply voltage is also decreased along with the pre-charge voltage, the power consumption reduces to 0.25μW.

The bit line swing is very important factor for the speed. Bit line swing depends on the overall capacitance of the array. It is the major factor for power dissipation as well as speed of the SRAM. When the pre-charge voltage applied is maximum equal to the supply voltage, the bit line swing observed is 150 mV/ns. As the pre-charge voltage is decreased further, it reduces to 43mV/ns. When the supply voltage is also decreased with the pre-charge voltage, the bit line swing get reduces to 200 μV/ns at the pre-charge voltage of 0.5V. If it is decreased further, again the bit line swing starts to increase. The summarized results are shown in table 1.

A 32kb memory array is used for the simulation using 0.25μm technology. The minimum power dissipation is observed to be 0.25μm. The bit line swing is reduced up to 200μV/ns.

Figure 7. Layout of a cell

TABLE I SIMULATION RESULTS

Parameter Value SRAM 256 x 128 bits Supply Voltage 600mVminimum Operating Speed 1GHz read Power Dissipation 0.25 μW minimum Bit line Swing 200 μV/ns minimum Precharge Voltage 300mV minimum Power Saving 13% max Number of metal layers 3 Area penalty due to redundant elements 10% Technology 0.25μm CMOS

4

VI. CONCLUSION The power loss sense and compensated, optimized for

power, high speed, fault tolerant mechanism for the memory pages is presented in this paper for the codebook SRAM. This kind of SRAM is applicable in codebook memories as part of many image compressions and processing applications such as vector quantizers. The schematic and the layout for 256X8 SRAM which is a page of 32kb SRAM is designed and layed out using 0.25μm technology is presented. The read and write operations are verified. The highest attainable power reduction approaches to 0.25μW with bit line swing to 200μV/ns which are the major outcome in experimental results.

REFERENCES [1] A. Allan et al.,”2001 technology roadmap for Semiconductors,” IEEE J.

Computer, vol. 35, No. 1, pp. 42-53, Jan – 2002. [2] Y. S. Lu, C.H.Hsu, K Wang, C. Wu, “Efficient Built-In Redundancy Analysis for Embedded Memories With 2-D Redundancy”, IEEE Trans.

On VLSI Systems, vol 14, No. 1, pp. 34-42, Jan 2006. [3] Kouichi Kanda, Hattori Sadaaki, T Sakurai, “90% Write Power Saving

SRAM Using Sense Amplifying Memory Cell”, IEEE J. Solid State Circuits, vol 39, No. 6, pp. 927-933, June 2004.

[4] K. M. Sharifkhani, M. Sachdev, “Segmented Virtual Ground Architecture for Low Power Embedded SRAM”, IEEE Trans. On VLSI Systems, vol 15, No. 2, pp. 196-205, Feb 2007.

[5] Khellah M, Somasekhar D. et al.,” A 256 kb Dual Vcc SRAM Building Block in 65nm CMOS Process With Actively Clamped Sleep Transistor”, IEEE J. Solid State Circuits, vol. 42, No. 1, pp. 233-242, Jan 2007.

[6] Chang L, Montoye R.K. et al,“An 8T-SRAM for Variability Tolerance and Low – Voltage Operation in High Performance Caches”, IEEE J. Solid State Circuits, vol. 43, No. 4, pp. 956-963, April 2008.

[7] Rossi D, Nieuwland A.K. et al.,” Power Consumption of Fault Tolerant Busses ”, IEEE J. Solid State Circuits, vol. 16, No. 5, pp. 542-553, May 2008.

[8] Kojii Nii, Y. Tsukamato, H.Makino,” A 90nm Low Power 32-kB Embedded SRAM with Gate Leakage Suppression Circuit for Mobile Applications”, IEEE J. Solid State Circuits, vol. 39, No. 4, pp. 684-692, April 2004.

[9] F. Hamzaoglu et al. “Analysis of Dual- SRAM Cells With Full-Swing Single-Ended Bit Line Sensing for On-Chip Cache”, Trans. VLSI, vol. 10, no. 2, Apr 2002.

[10] U.Cho, T. Kim, Y. Yoon, J. Lee et al.,” A 1.2V 1.5 - Gb/s 72 - Mb DDR3 SRAM ”, IEEE J. Solid State Circuits, vol. 38, No. 11, pp. 1943-1951, Nov 2003.

[11] K. Ohatta, F. Arakawa, T. Kusunoki et al., “Power Reduction Techniques for a 1 Mb ECL CMOS SRAM With an Access Time of 550 ps And an Operating Frequency of 900MHz”, IEEE J. Solid State Circuits, vol. 35, No. 4, pp. 564-571, April 2000.

[12] K Koh, B.J. Hwang., G.H.Han, Y.S.Son et al.,” Ultra Low Power High Speed SRAM for Mobile Applications Using Poly-Si Gate 90nm CMOS Technology”, Symposium on VLSI Technology, Digest of Technical Papers 2003, pp. 67-68, 2003.

[13] Stefan Cserveny, L. Sumanen, J. Masgonty, C. Piguet, “Locally Switched and Limited Source Body Bias and Other Leakage Reduction Techniques for a Low-Power Embedded SRAM”, Trans. On Circuits and Systems, vol. 52, no. 10, pp. 636-640 Oct 2005.

[14] Ken ‘ichi Agawa, Hiroyuki Hara, “A Bitline Leakage Compensation Scheme for Low Voltage SRAMs”, IEEE J. Solid State Circuits, vol. 36, No. 5, pp. 726-734, May 2001.

[15] Keejong Kim, Hamid Mahmoodi, “A Low Power SRAM Using Bit-Line Charge Recycling”, IEEE J. Solid State Circuits, vol. 43, No. 2, pp. 446-458, Feb 2008.

[16] Tae-Hyoung Kim, et. al., “A 0.2 V, 480 Kb Subthreshold SRAM With 1

K Cells Per Bitline for Ultra-Low-Voltage Computing”, IEEE J. Solid State Circuits, vol. 43, No. 2, pp. 564-571, Feb 2008.

[17] Fabino Frustaci, et. al., “Techniques for Leakage Energy Reduction in Deep Submicrometer Cache Memories”, IEEE Trans. On VLSI Systems, vol 14, No. 11, pp. 1238-1248, Nov 2006.

5

Documents

[IEEE TENCON 2009 - 2009 IEEE Region 10 Conference - Singapore (2009.01.23-2009.01.26)] TENCON 2009 - 2009 IEEE Region 10 Conference - Rhythmic codebook of 300mV precharge, 1ns, low