6
EE241 Final Project Report, Spring 2013 1 AbstractThe leakage power consumption of memories is becoming dominant in a system as technology scaling down. Fast nonvolatile memories (NVMs) offer a tremendous opportunity to eliminate memory leakage current during standby mode. Resistive random access memory (RRAM) with crosspoint structure is considered to be one of the most promising emerging NVMs. However, the absence of access transistors puts big challenges to the write/read operation. In this paper, we propose the differential 2R crosspoint structure to increase the read margin. A 64KB RRAM is constructed by 28/32nm Predictive Technology Models (PTM) and simulated by Eldo. Circuit techniques, such as divided WL and Sense-before-Write, are employed to reduce write leakage and elongate the endurance. Finally, the comparison between SRAM and RRAM allows us to investigate the possibility of utilizing RRAM as cache for increasing energy efficiency in mobile electronics. KeywordsRRAM, memristor, nonvolatile memory, crosspoint, cache, zero standby current, mobile memory system I. INTRODUCTION Memories have been the largest portion in integrated circuit of consumer electronics in terms or area and energy consumption. As the trend of scaling technology [1] goes, the leakage current issue in SRAM becomes more and more severe. To thoroughly eliminate standby current, nonvolatile memories are good candidates, which can be completely shut down without worrying about the loss of data. In nonvolatile memory category, flash memory [2] is the most popular one in the market due to the small cell size. However, it could never replace SRAM as a cache because of the slow write speed. Other than high-voltage program/erase (P/E) operation, low endurance (10 6 cycles), and slow speed, the physical limitation of oxide thickness is the reason preventing flash memory from monopolizing nonvolatile memory market for another decades. Therefore, a new nonvolatile memory needs to be developed to replace flash memory with comparable yield and high read/write speed. There have been several emerging nonvolatile memories developed in recent years, like ferroelectric memory (FeRAM) [3], magnetoresistive memory (MRAM) [4], spin-transfer torque memory (STT-RAM) [5], phase-change memory (PRAM) [6], resistive memory (RRAM) [7], and conductive-bridging memory (CBRAM). Among all the new technologies, one of the most promising candidates to replace flash memory is RRAM. There are many different recipes for resistive storage materials in the ongoing researches. Typically, the cell structure is like a tiny sandwich with two metal electrodes on top and bottom and metal-oxide in the middle. For example, in Fig. 1 [8], the middle material is made of titanium dioxide (TiO 2 ) in two layers: the lower TiO 2 layer is electrically insulating, but the upper TiO 2-x is conductive, because its oxygen vacancies are positively charged. RRAM cell is a device whose resistance depends on the magnitude and polarity of the voltage applied to it and the length of time that voltage has been applied. Fig. 1 shows the switching mechanism of a TiO 2 -based RRAM cell. A positive voltage on the cell repels the (positive) oxygen deficiencies in the upper TiO 2-x layer, making them move into the insulating TiO 2 layer below. This procedure, called SET, increases the percentage of conducting TiO 2-x and thus decreases the resistance of the cell. On the other hand, a RESET operation means that a negative voltage on the cell attracts the positively charged oxygen vacancies, pulling them out of the TiO 2 . The amount of TiO 2 increases, thereby increasing the resistance of the cell. The amount of TiO 2-x and TiO 2 stays the same when the voltage is turned off. Fig. 1. SET and RESET mechanism of TiO2-based RRAM. [8] The features of RRAM include low switching voltage, small cell area, and fast switching time. The cells are fabricated in back end of line (BEOL), which enables crosspoint structure to hide the peripheral circuits under the cell array. Furthermore, multilayer array can be realized to maximize array efficiency [9]. However, the absence of access transistors in crosspoint array means more challenges on the peripheral circuit design. Memory hierarchy, as shown in Fig. 2, reveals the tradeoff between memory density and operation speed. The high-speed memory usually needs larger area, which limits the memory capacity of a cache. Therefore, memory hierarchy is built to create an illusion of fast and large memory. If there were one nonvolatile memory with high speed, small area and high endurance, it would crash the existent memory hierarchy and form an one-memory system. Moreover, the power and time for moving data around could be saved and achieve instant power-on procedure. Unfortunately, there is no such a perfect memory so far. Although the operation speed of RRAM is not as fast as that of L1 cache, we can still expect RRAM to substitute L2/L3 cache memory with comparable speed. SET RESET TiO 2-x (with oxygen vacancies) TiO 2 (perfect titanium oxide) Differential 2R Crosspoint RRAM for Memory System in Mobile Electronics with Zero Standby Current Pi-Feng Chiu, Pengpeng Lu, and Zeying Xin Electrical Engineer and Computer Science Department, University of California, Berkeley, CA {pfchiu, penpenglu, xinzeying}@berkeley.edu

Differential 2R Crosspoint RRAM for Memory System in ...people.eecs.berkeley.edu/~pfchiu/EE241_finalReport.pdfresistance cell (R H) and switch it to a low resistance cell (R). . Current

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • EE241 Final Project Report, Spring 2013 1

    Abstract— The leakage power consumption of memories is becoming dominant in a system as technology scaling down. Fast nonvolatile memories (NVMs) offer a tremendous opportunity to eliminate memory leakage current during standby mode. Resistive random access memory (RRAM) with crosspoint structure is considered to be one of the most promising emerging NVMs. However, the absence of access transistors puts big challenges to the write/read operation. In this paper, we propose the differential 2R crosspoint structure to increase the read margin. A 64KB RRAM is constructed by 28/32nm Predictive Technology Models (PTM) and simulated by Eldo. Circuit techniques, such as divided WL and Sense-before-Write, are employed to reduce write leakage and elongate the endurance. Finally, the comparison between SRAM and RRAM allows us to investigate the possibility of utilizing RRAM as cache for increasing energy efficiency in mobile electronics. Keywords— RRAM, memristor, nonvolatile memory,

    crosspoint, cache, zero standby current, mobile memory system

    I. INTRODUCTION Memories have been the largest portion in integrated circuit

    of consumer electronics in terms or area and energy consumption. As the trend of scaling technology [1] goes, the leakage current issue in SRAM becomes more and more severe. To thoroughly eliminate standby current, nonvolatile memories are good candidates, which can be completely shut down without worrying about the loss of data. In nonvolatile memory category, flash memory [2] is the most popular one in the market due to the small cell size. However, it could never replace SRAM as a cache because of the slow write speed. Other than high-voltage program/erase (P/E) operation, low endurance (106 cycles), and slow speed, the physical limitation of oxide thickness is the reason preventing flash memory from monopolizing nonvolatile memory market for another decades. Therefore, a new nonvolatile memory needs to be developed to replace flash memory with comparable yield and high read/write speed.

    There have been several emerging nonvolatile memories developed in recent years, like ferroelectric memory (FeRAM) [3], magnetoresistive memory (MRAM) [4], spin-transfer torque memory (STT-RAM) [5], phase-change memory (PRAM) [6], resistive memory (RRAM) [7], and conductive-bridging memory (CBRAM). Among all the new technologies, one of the most promising candidates to replace flash memory is RRAM. There are many different recipes for resistive storage materials in the ongoing researches. Typically, the cell structure is like a tiny sandwich with two metal electrodes on top and bottom and metal-oxide in the middle. For example, in Fig. 1 [8], the middle material is made of

    titanium dioxide (TiO2) in two layers: the lower TiO2 layer is electrically insulating, but the upper TiO2-x is conductive, because its oxygen vacancies are positively charged. RRAM cell is a device whose resistance depends on the magnitude and polarity of the voltage applied to it and the length of time that voltage has been applied. Fig. 1 shows the switching mechanism of a TiO2-based RRAM cell. A positive voltage on the cell repels the (positive) oxygen deficiencies in the upper TiO2-x layer, making them move into the insulating TiO2 layer below. This procedure, called SET, increases the percentage of conducting TiO2-x and thus decreases the resistance of the cell. On the other hand, a RESET operation means that a negative voltage on the cell attracts the positively charged oxygen vacancies, pulling them out of the TiO2. The amount of TiO2 increases, thereby increasing the resistance of the cell. The amount of TiO2-x and TiO2 stays the same when the voltage is turned off.

    Fig. 1. SET and RESET mechanism of TiO2-based RRAM. [8]

    The features of RRAM include low switching voltage,

    small cell area, and fast switching time. The cells are fabricated in back end of line (BEOL), which enables crosspoint structure to hide the peripheral circuits under the cell array. Furthermore, multilayer array can be realized to maximize array efficiency [9]. However, the absence of access transistors in crosspoint array means more challenges on the peripheral circuit design.

    Memory hierarchy, as shown in Fig. 2, reveals the tradeoff between memory density and operation speed. The high-speed memory usually needs larger area, which limits the memory capacity of a cache. Therefore, memory hierarchy is built to create an illusion of fast and large memory. If there were one nonvolatile memory with high speed, small area and high endurance, it would crash the existent memory hierarchy and form an one-memory system. Moreover, the power and time for moving data around could be saved and achieve instant power-on procedure. Unfortunately, there is no such a perfect memory so far. Although the operation speed of RRAM is not as fast as that of L1 cache, we can still expect RRAM to substitute L2/L3 cache memory with comparable speed.

    SET RESET TiO2-x (with oxygen vacancies)

    TiO2 (perfect titanium oxide)

    Differential 2R Crosspoint RRAM for Memory System in Mobile Electronics with Zero Standby Current

    Pi-Feng Chiu, Pengpeng Lu, and Zeying Xin Electrical Engineer and Computer Science Department, University of California, Berkeley, CA

    {pfchiu, penpenglu, xinzeying}@berkeley.edu

  • EE241 Final Project Report, Spring 2013 2

    Fig. 2. Memory hierarchy.

    The remainder of this paper is organized as follows. Section

    II provides the introduction of crosspoint architecture and its inherent issues. Section III describes the cell analysis and proposed differential 2R crosspoint array. Section IV shows the circuit implementation of a 64KB crosspoint RRAM circuit and other techniques, like divided WL and Sense-before-Write approach. Section V presents the simulation result of the differential 2R crosspoint array. Section VI discusses the comparison between SRAM and RRAM to investigate the possibility of utilizing RRAM as cache to increase energy efficiency in mobile applications. Conclusions are drawn in Section VII.

    II. CROSSPOINT ARCHITECTURE Conventionally, a RRAM cell is constructed by one

    transistor and one programmable resistive device (1T1R). The transistor not only works as a switch for accessing the selected cell and isolating unselected ones, but also constrains the write current and limits the cell distribution. However, in order to provide sufficient write current, the transistor needs to be upsized, which dominates the cell area.

    An alternative approach is crosspoint architecture, as shown in Fig. 3. In a crosspoint array, RRAM cells are sandwiched between wordlines (WLs) and bitlines (BLs), which could achieve ideal cell size of 4F2. Moreover, since a crosspoint array permits a stacked structure, the effective cell area is further reduced. Although avoiding access transistor is beneficial from cell area standpoint, it introduces other complexities during write and read operation.

    Fig. 3. 1T1R and crosspoint RRAM array with single layer and multiple layer

    scheme. [9]

    Write reliability is a serious concern in crosspoint arrays. There are two potential problems in write operation: write failure, an unsuccessful write to selected cells, and write disturbance, an undesirable write to unselected cells. To

    successfully store data to cells, the write voltage (VWL-VBL) should be fully applied across the selected cell. However, in reality, both wire/switch resistance and sneak current are not trivial. Hence, the voltage applied across a cell varies based on the location of the cell as well as the data pattern stored in all of the RRAM cells in the array.

    The absence of access transistor makes it difficult to isolate the unselected cells. To prevent write disturb in crosspoint array, unselected WLs and BLs should be carefully biased. There are four possible schemes to bias unselected WLs/BLs [10]: HWHB activates the selected WL and BL, and half biases unselected WLs and BLs; FWFB activates the selected WL and BL, and leaves unselected WLs and BLs floating; HWFB (FWHB) activates the selected WL and BL, half biases unselected WLs (BLs), and leaves unselected BLs (WLs) floating.

    To minimize energy consumption during write operation, three schemes (HWFB works similarly to FWHB) are compared in terms of energy efficiency. Fig. 4 illustrates the leakage path in three schemes, by which we can estimate the total leakage current in terms of WL and BL number (m, n). The equation of leakage currents under the worst case (all unselected cells are RL) are shown in (1) – (3).

    HWHB: (1)

    FWHB: (2)

    FWFB: (3)

    For an array with 16 WLs and 16 BLs, HWHB consumes more leakage current than FWHB and FWFB do. However, FWFB has an inherent problem that may result in write disturb. Floating both unselected BL and unselected WL may lead to more than VSET/2 applied on an unselected high resistance cell (RH) and switch it to a low resistance cell (RL).

    Fig. 4. Current path through selected and unselected cells in (a) HWHB, (b)

    FWHB, and (c) FWFB.

    In read operation, sneak current from unselected cells may reduce the read margin and output incorrect data, especially when the selected cell is RH and all unselected cells are RL. In order to alleviate read disturbance, parallel read can be employed to read all the cells in the same row for eliminating

    CPU Register

    Cache L1 L2

    Main Memory (DRAM)

    Permanent Storage Hard Disk Drive, Solid State Drive

    Ileakage =VSET2

    × (n−1RL

    +m−1RL

    )

    Ileakage =VSET2

    ⋅n−1RL

    Ileakage =VSET ⋅(n−1)(m−1)(n+m−1) ⋅RL

    BLx (VSET/2)

    BLs (0)

    (a) HWHB

    … n-1

    m-1

    WLs (VSET)

    WLx (VSET/2)

    BLx (VSET/2)

    BLs (0)

    … n-1

    WLs (VSET)

    … m-1

    WLx (Floating)

    BLx (Floating)

    BLs (0)

    … n-1

    WLs (VSET)

    … m … WLx

    (Floating)

    m-1 m-1

    (b) FWHB (c) FWFB

  • EE241 Final Project Report, Spring 2013 3

    half-select issue. A simple and instinctive way to read out the cell state is by mirroring the cell current and comparing it with a reference current. In the parallel read scheme, different data pattern still cause slightly difference in BL voltage, which degrades the read margin. To minimize the BL voltage difference, the transistor size in diode-connected current mirror should be increased. However, since it is still impossible to thoroughly eliminate the leakage current caused by BL voltage difference, array size is limited. Moreover, PVT variation and wide cell distribution make the sensing scheme more challenging.

    III. DIFFERENTIAL 2R CELL AND CROSSPOINT ARRAY

    A. Cell Characterization To start designing a crosspoint RRAM circuit, some cell

    parameters are required, like write/read voltages (VSET, VRESET, VREAD), period of write pulses (TSET, TRESET), and high/low resistance values (RH, RL). The information can be extracted from the Verilog-A model, which characterizes switching behavior of the RRAM cell. By simulating with the model, Fig. 5(a) shows the tradeoff between required time (TSET) and voltage (VSET) to program the cell from RH to RL under different targeted RL value. Fig. 5(b) plots the relationship between write energy and RL value under different VSET. A higher RL requires less time and energy to program and also suppresses the overall leakage current. However, to maintain sufficient read margin, a smaller RL is preferred, i.e., larger RH/RL. Fig. 5(b) shows that it is more energy-efficient to write the cell by a higher voltage and a shorter pulse. The tradeoff here is the RL value is more sensitive to shorter pulses.

    (a)

    (b)

    Fig. 5. (a) Write time and (b) write energy of a RRAM cell under different VSET and RL.

    B. Differential 2R Scheme To solve the leakage issue when reading cells in crosspoint

    array, we proposed the differential 2R crosspoint structure, as shown in Fig. 6(a). In this scheme, two resistive devices with opposite resistance states together represent 1-bit datum. To store a 1, Ra is written to low resistance state (LRS) and Rb is written to high resistance state (HRS); to store a 0, Ra is in HRS and Rb is in LRS. Instead of sensing the current flowing through the cell, the state of a differential 2R cell is determined simply by the voltage divider of Ra and Rb. In read operation, the BL voltage would be Vread*Rb/(Ra+Rb) by applying Vread across Ra and Rb. The BL is then connected to a simple StrongARM sense amplifier with a reference voltage of Vread/2. Therefore, the read operation is immune to the leakage current flowing in from neighbor BLs and greatly increases the read margin without limiting the block size. Moreover, the differential 2R cell contains both RH and RL, which solves the data pattern issue and suppresses the leakage consumption in read operation.

    Thanks to the stack ability of RRAM, the differential 2R cell can be constructed between different metal layers without much area penalty. Since Ra and Rb have opposite electrodes connected to WLa and WLb, we can SET one device and RESET another at the same time by applying the same voltage on WLa and WLb. The operation condition is listed in Fig. 6(b). In write-1 operation, both WLa and WLb are connected to a write voltage, Vwrite, and BL is connected to ground. A Vwrite of positive polarity drops on Ra, which SET Ra to LRS, and a Vwrite of negative polarity drops on Rb, which RESET Rb to HRS. To write a 0, simply connect BL to Vwrite and ground WLa/WLb. Note that the write operation of differential 2R cell is based on the assumption that VSET equals to VRESET.

    Fig. 6. (a) Differential 2R crosspoint array and (b) table of operational

    condition in write mode.

    IV. CIRCUIT IMPLEMENTATION In section II, three schemes for biasing unselected WLs and

    BLs are discussed and FWHB scheme is chosen in this work to reduce leakage current and avoid disturbance. According to equation (2), the leakage current in write operation using FWHB scheme is proportional to the cell number on one WL. The energy efficiency would get worse while increasing the

    Vset = 1.0V

    Vset = 0.9V

    Vset = 0.8V

    Vset = 0.7V

    Vset = 0.6V

    Vset = 0.5V

    WLa[1]

    WLb[1] WLa[0]

    WLb[0]

    BL0 BL1 BL2

    Ra

    Rb

    1 cell

    +

    - +

    -

    Write-1 Write-0 Ra SET RESET Rb RESET SET WL Vwrite 0 BL 0 Vwrite

  • EE241 Final Project Report, Spring 2013 4

    WL length. The equation should be modified in differential 2R scheme to include another resistance device, as shown in (4). Also, the leakage current would not be data-dependent in this case since every cell contains both RH and RL. For low-power concern, our targeted write current is set around 100uA, which requires a short WL of 4-cell wide and RL of 8KΩ with 40% write energy efficiency.

    Ileakage =VSET (n−1)

    2⋅ ( 1RH

    +1RL) (4)

    Instead of building a 4x4 array with its own peripheral circuit, we can construct a large array and divide one global WL (GWL) into local WLs (LWL) [11]. Only one LWL will be activated at a time to reduce the write leakage current. Switches need to be inserted every four columns to connect to GWL according to the decoded address signal. Fig. 7 shows the cross sectional view of the differential 2R array with divided WL scheme. SWa and SWb connect LWLs to GWLs if the block is selected. Although transistors can be hidden beneath the array, additional area is consumed for connection from transistors to higher metal layers. There is a tradeoff between area penalty and leakage current. For a LWL of 4-cell wide, the area might be twice large than that without divided WL. However, the area is still much smaller than 1T1R design.

    Fig. 7. Cross sectional view of the differential 2R array with divided WL

    scheme.

    According to the cell model, if a SET pulse repeatedly access to the same cell, rather than stay at the same intermediate resistance, the resistance state of the cell will keep dropping until it hits the lowest resistance level (~0.5KΩ). This would result in extremely large cell distribution and current consumption. To prevent the over-SET situation, we proposed the Sense-before-Write approach. At the beginning of the write cycle, a read operation is first conducted and the output is fed back to the control circuit to determine whether to enable the write operation. Therefore, the cell would not be written again unless the data is different from its current state. By Sense-before-Write approach, the cell distribution is narrower and the leakage current can be suppressed. Moreover, the endurance is further elongate by avoiding unnecessary cell access.

    The block diagram of a 64KB crosspoint RRAM circuit is shown in Fig. 8, which contains 8 I/O blocks. Each I/O block includes 64 sub-blocks, control circuits, WL/BL multiplexers and drivers, StrongARM sense amplifiers, and voltage generator. In SRAM, WLs are connected to the gate of access transistors, while in crosspoint array, WLs need to be floating,

    ground, or connected to an intermediate voltage (Vwrite or Vread) depending on input values and operational mode. Therefore, a byte of data is interleaved to 8 I/O blocks, which is decoupled with separated sets of peripheral circuits. The intermediate voltages, like write voltage (Vwrite), read voltage (Vread) and unselected BL voltage Vwrite/2 (Vhalf) for preventing disturbance, are provided by the voltage generator, which is not shown in the block diagram.

    Fig. 8. Block diagram of a 64KB crosspoint RRAM circuit.

    The control circuit computes all the input control signals,

    like write enable (WE), read enable (RE), input data (DIN), address (A) and output data (DOUT), to determine the current operational mode and output correspondent control signals to other circuits. WL/BL multiplexers and drivers take the control signal from control circuit mentioned above to switch between different intermediate voltages for selected and unselected WLs/BLs. The StrongARM sense amplifiers (Fig. 9) with PMOS as input transistors are used for low input common mode voltage. It compares the BL voltage and the reference voltage (Vref), and then output DOUT. The voltage-sensing scheme in differential 2R crosspoint array is less susceptible to cell distribution and data pattern than current-sensing scheme. Both WL/BL drivers and sense amplifiers should be designed as small as possible to fit in the narrow cell pitch.

    Fig. 9. StrongARM sense amplifiers with PMOS input transistors.

    GWLa GWLb

    LWLa

    LWLb

    Ra

    Rb

    BL

    SWa SWb

    …"

    ...

    Bloc

    k [0

    ]

    Bloc

    k [1

    ]

    Bloc

    k [2

    ]

    Bloc

    k [6

    2]

    Bloc

    k [6

    3]

    WL

    mul

    tiple

    xer a

    nd d

    river

    WERE

    CLK

    DIN[7:0]A[7:0] SAEN

    BL multiplexer and driver

    VwriteVhalfVread

    VrefStrongARM Sense Amplifier

    DOUT[7:0]

    Control circuit

    LWLGWL

    I/O[7:0]

    VBL VREF

    VOUT

    SAENb

    SAENb SAENb

    SAENb

  • EE241 Final Project Report, Spring 2013 5

    V. SIMULATION RESULTS The simulation of a sub-block and its peripheral circuits is

    conducted using Eldo with 28/32nm Predictive Technology Model (PTM) and RRAM Verilog-A model. The testbench for functionality is first writing the checkerboard pattern, reading, writing the opposite data and reading again. Fig. 10 shows the waveform of write and read operation. To write to a cell, the selected WLa and WLb are connected to ground and BL is connected to Vwrite. The unselected WLs are kept floating and will be charged to an intermediate voltage of around Vwrite/2 to prevent disturbance. By observing the current flowing through the selected cell (cell01a and cell01b), we can confirm the switching behavior by checking the increasing current in SET operation and decreasing current in RESET operation.

    In read operation, the selected WLa is applied to Vread and WLb is connected to ground. Thus, the BL voltage would be proportional to the resistance ratio of Ra and Rb. The BL voltage is then compared with Vref to determine DOUT. The sense enable (SAEN) signal is triggered at the middle of clock period to ensure that the voltage difference is fully developed. Table I shows the features of the differential 2R crosspoint RRAM circuit. The average current during write cycle is 200 µA and the average current during read cycle is 100 µA. Note that there are always voltage drops when the read/write voltage passing through switches. The voltage drop depends on the switch size and the current flowing through it. Therefore, we size the switches by having the maximum voltage drop of 50mV when the maximum current flowing through.

    Fig. 10. Waveform of read and write operation in differential 2R crosspoint

    array.

    VI. COMPARISON For replacing SRAM, we need to compare the differential

    2R RRAM and SRAM in power, area, performance, and endurance aspects. In terms of performance (speed), it is difficult to beat L1 cache since the yield of RRAM is low when the write pulse is short. The resistance value of RRAM cell is very sensitive under short pulses. To maintain good

    yield, write pulse needs to be long enough that small timing skews can be neglected. However, it is still possible to substitute differential 2R RRAM for L3 or even L2 cache.

    In power consumption aspect, differential 2R RRAM would require more power for generating the intermediate voltages (Vread/Vwrite/Vhalf). Also, unlike SRAM, which is a static logic, resistive cells conduct current during the whole period in read and write operation. We have constructed a simple 6T SRAM cell by Predictive Technology Model (PTM) and the leakage current is 570pJ/cell at 0.4V. The leakage current would be about 300µA in a 64KB array, which cannot be ignored. Therefore, it is beneficial to use RRAM as a cache in the memory system of mobile electronics because the standby period is long and the battery life is critical.

    From area perspective, the area of SRAM is around 0.182 µm2 in 32nm technology and 0.1 µm2 in 22nm technology [12]. In differential 2R crosspoint RRAM, assume the metal width and space are both 50nm, the area would be (50*4 nm)2 due to divided WL. It is 2.5x smaller than the 22nm SRAM. Therefore, it is very competitive for large density memory. Nevertheless, finite endurance of RRAM cell due to its switching mechanism would greatly constrain its application.

    Table I Features of differential 2R crosspoint RRAM circuit Clock Frequency 500 MHz Density 64KB Power supply 1.0 V Write voltage (Vwrite) 0.95 V Read voltage (Vread) 0.4 V Reference voltage (Vref) 0.2 V RH/RL 90KΩ/8KΩ Write current (one block) 140 µA Read current (one block) 16.6 µA Standby current ~ 0 A

    VII. CONCLUSION In this work, we have proposed a voltage-sensing

    differential 2R crosspoint structure, which makes it free from the leakage issue in current-sensing crosspoint array design. To avoid disturbance and limit the leakage current during write operation, FWHB scheme is used to bias the unselected WLs and BLs. Furthermore, divided WL technique with WL length of 4-cell wide is adopted to constrain the write current to be below 200µA. The Sense-before-Write approach prevents cells from setting to a lower resistance and leading to large leakage current.

    We have constructed a 64KB differential 2R crosspoint array with peripheral control circuit. This preliminary simulation result shows that the array can be operated under 500MHz with write current of 140 µA and read current of 16.6 µA. The write operation would require two cycles due to the Sense-before-Write scheme. Also, the divided WL scheme trades area for leakage current reduction.

    Since the transistor and RRAM cell models used in this design are both predictive models, they are only applied to help us validate our idea, instead of trusting all the numbers.

    printed Sun May 5 2013 18:05:58 by pfchiu on bwrcrdsl-3.eecs.berkeley.edu Synopsys, Inc. (c) 2000-2009

    Circuit de debug OXRAM 28-Apr-13 15:49:44waveview 1

    90n

    90n

    100n

    100n

    110n

    110n

    120n

    120n

    130n

    130n

    140n

    140n

    TIME(sec) (lin)

    00.30.50.70.9

    (lin)

    i0.v(wla<

    00.30.50.70.9

    (lin)

    i0.v(wlb<

    0.10.30.50.70.9

    (lin)

    i0.v(wla<

    00.30.50.70.9

    (lin)

    i0.v(wlb<

    0.10.30.50.70.9

    (lin)

    i0.v(bl

    00.4

    1

    (lin)

    ) tile4x4.t

    -30u-10u10u30u50u

    (lin)

    i0.i(dd01

    -80u-40u

    040u80u

    (lin)

    i0.i(dd01

    WLa[0]

    WLb[0]

    WLa[1]

    WLb[1]

    BL[1]

    DOUT

    I(cell01b)

    I(cell01a)

    Write&0(to(cell01(

    0(

    ~Vwrite/2(

    ~Vwrite(

    SET(

    RESET(

    Write&1(to(cell11(

    R1(R0(

    Vref(

    Write(opera9on( Read(opera9on(

  • EE241 Final Project Report, Spring 2013 6

    In reality, there are more problems for us to deal with, like cell distribution and reliability. To analyze those issues, we need more detailed information and a better model that characterizes variation and reliability.

    Finally, we would like to investigate the possibility of RRAM as a cache. By the cell characteristics and the simulation results, it is difficult to have a high-yield RRAM with operational speed faster than L1 cache. However, we can still aim for replacing L2/L3 cache. From area perspective, the differential 2R cell size is much smaller than the SRAM bit cell. From energy perspective, the active write energy of differential 2R RRAM might be larger than that of SRAM. But the zero-standby-current characteristic of RRAM makes it a good fit to mobile electronics, in which memories stay idle most of the time and battery life is the most critical. Nevertheless, finite endurance due to its switching mechanism would be a serious problem that prevents it from using as a cache.

    For future work, how to characterize the cells within the differential 2R array is an important topic to help people further progress on this technology. Also, suppressing the effect of wide cell distribution and leakage current reduction is the key in the peripheral circuit design.

    REFERENCES [1] ITRS Roadmap (http://www.itri.net) [2] Yan Li, et al., “128Gb 3b/cell NAND Flash Memory in 19nm

    Technology with 18MB/s Write Rate and 400Mb/s Toggle Mode,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2012, pp. 436-437.

    [3] T. Takashima, et al., “A 100MHz Ladder FeRAM Design With Capacitance-Coupled-Bitline (CCB) Cell,” IEEE Journal of Solid-State Circuits, Vol. 46, No. 3, March 2011.

    [4] T. Shigibayashi, et al., “A 16-Mb Toggle MRAM With Burst Modes,” IEEE Journal of Solid-State Circuits, Vol. 42, No. 11, Nov. 2007.

    [5] D. C. Ralph and M. D. Stiles, “Spin Transfer Torques,” Journal of Magnetism and Magnetic Materials, vol. 320, issue 7, pp. 1190-1216, April 2008.

    [6] R. E. Simpson, et al., “Toward the Ultimate Limit of Phase Change in Ge2Sb2Te5,” Nano Letter, pp. 414-419, 2010.

    [7] Elaine Ou and S. Simon Wong, “Array Architecture for a Nonvolatile 3-Dimensional Cross-Point Resistance-Change Memory,” IEEE J. Solid-State Circuits, vol. 46, no. 9, pp. 2158-2170, Sep. 2011.

    [8] R. Stanley Williams, “How we found the missing memristor,” IEEE Spectrum, vol. 45, no. 12, pp. 28-35, 2008.

    [9] A. Kawahara, et al., “An 8Mb Multi-Layered Cross-Point ReRAM Macro With 443MB/s Write Throughput,” IEEE Journal of Solid-State Circuits, Vol. 48, No. 1, January 2013.

    [10] D. Niu, C. Xu, N. Muralimanohar, N. P. Jouppi, Y. Xie, “Design Trade-Offs for High Density Cross-Point Resistive Memory,” ISLPED, 2012, pp. 209-214.

    [11] M. Yoshimoto, et al., “A Divided Word-line Structure in the Static SRAM and Its Application to a 64K Full CMOS RAM” IEEE Journal of Solid-State Circuits, Vol. 18, No. 5, Oct. 1983.

    [12] P. Packan, et al., “High Performance 32nm logic technology featuring 2nd generation high-k + metal gate transistors,” in Int. Electron Devices Meeting (IEDM) Tech. Dig. Papers, Dec. 2009, pp. 659-662.