Upload
lytu
View
216
Download
1
Embed Size (px)
Citation preview
1
Kaushik Roy Department of Electrical and Computer Engineering,
Purdue University, West Lafayette, IN https://engineering.purdue.edu/NRL/index.html
Beyond Charge-Based Computing: STT-MRAMs
Why “Beyond Charge based Computing”?
Device 1 Device 2
Channel length
Process Variations
Po
wer
Den
sit
y (
W/c
m2)
Power Density
Tech. generation
Failu
re p
rob
ab
ilit
y
Time
Defects Life time degradation
Reliability
1nm
10nm
100nm
1µm
Devic
e D
imen
sio
n
0.1nm 1980 2000 1960
10µm
2020 2040
Increased Process
Variations, Leakage,
Power Density; Less
Reliability and Yield
Fundamental Limit
Technology Trend
Buried Oxide (BOX)
Substrate
Fully-depleted body
Gate
VG
VSVD
DrainSource
Vback
Buried Oxide (BOX)
Substrate
Fully-depleted body
Gate
VG
VSVD
DrainSource
Vback
Bulk-CMOS
FD/SOI
Carbon nanotube Graphene TFETs III-V devices Spintronics
Single gate device
2003 More Moore
2020
DGMOS
FinFET Trigate
Multi-gate devices
Buried Oxide (BOX)
Substrate
Source Floating Body Drain
GateVS
VG
VD
Buried Oxide (BOX)
Substrate
Source Floating Body Drain
GateVS
VG
VD
PD/SOI
Design methods to exploit the advantages of technology innovations
??
Beyond CMOS
Electron SPIN
Electron is a magnet
rotation (spin)
N
S
Magnetic moment
m
Nucleus
Orbital electrons Electrons with uni-directional
electron spin moments results in
magnet with non-zero moment
(ferro-magnets)
Classical representation of atom
Fe (4), Co (3) and Ni (2) : unpaired electrons per atom 5
Incomplete 3d orbitals is the principle source of magnetic moments in transition metals
Sources of Magnetic Moments in Ferromagnetic Materials
Direction Characteristics in s-, p- and d- orbital's
Iron Lattice Structure
(body central cubic)
Cobalt Lattice Structure (Hexagonal Close Pack)
State Variable: Charge & Spin
SPINTRONICS
LARGE SCALE SYSTEMS
Charge: semiconductor electronics Spin: magnetism
Giant Magneto Resistance (GMR) 1988: Peter Grunberg, Germany
Albert Fert, France 2007: Nobel Prize
Spin Transfer Switching (STS) 1996: Slonczewski, US
Memories: high density, stability, low read/write current, access time, zero leakage.. Logic (Boolean & Non-Boolean): Ultra low voltage switch, Neuromorphic computing,
(all-spin logic – no spin-charge conversion), i/p o/p isolation, zero leakage, Interconnects: Spin channel (short), Ultra low voltage swing for charge based int.
Recent Inventions
Lateral Spin Valves (Local & Non-Local)
2008: Yang, Kimura, Otani 2009: Sun
MgO MgO
MgO
MgO
MgO
MgO
MgO Free Layer Fixed Layer Ru
(a)
(b)(c)
(d)
MOS vs. Magnets Switching Energy
Dissipated Energy
MOS
= 0.5CV2 ≈ N*Eb
Gate
b BE xk T
Magnet (“Collective Entity”)
b BE xk T
N ≈10,000 , x=40
equivalent N =1
1e-15 J / switch
Dissipated Energy ≈ Eb
x=40 1e-19 J / switch Life-time = 7 years
Theoretical switching energy of magnet is 0.1aJ << 1fJ (MOSFET)
Non-volatile
Energy Barrier Modulation: MOSFETs vs. Magnets
Conventional Charge-based MOSFET (Bulk/SOI/FinFETS)
Nano-Magnets with Electron Spin as State Variable
•The energy barrier in magnets is defined as the product of anisotropy and volume (EB = Ku2.V)
b a t
2Ku SAT y x
x
y
tH =4πM (n -n )
b
bFor =2,n =0.32 and
a
n =0.90
•The energy barrier in the active channel region can be modulated using:
•Doping
• Uniform Channel Doping
• Symmetric/Asymmetric Halo Doping
• Source Drain Doping
•Work Function and Material Engineering
•Gate Dielectric and Thickness (TOX) Modulation
•The energy barrier in magnets can be modulated using:
• Shape and Interfacial magnetic Anisotropy (Ku2)
• Volume of the nano-magnet (thickness (t) and cross-sectional area (A))
• Saturation Magnetization (MSAT)
Changing the State of a Magnet
External Magnetic Field
» Current carrying wires, coils, etc.
» Dipolar coupling
» ..
Current Induced Spin-Transfer Torque
(1996)
12
Spin-Torque Transfer Magnetic Embedded Memory
Magnetic Tunneling Junctions (MTJ) Basics: Read Operation
Memory state is detected through difference in resistance using Sense Amplifier
FIXED
LAYER
FREE
LAYER
MgO
High resistance
state (RAP)
Anti-parallel orientation of magnets (AP)
FIXED
LAYER
FREE
LAYER
MgO
Low resistance state
(RP)
Parallel orientation of magnets (P)
AP P
P
R -RTMR= 100 %
R
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Current(P)
Current(AP)
slope=1/RP
slope=1/RAP
Voltage (V)
Curr
ent
(mA
)
RP=1kΩ
RAP=2kΩTMR=100 %
Electrical Characteristics of MTJ
e e e e e e e e
e e e e
Pinned Ferromagnetic Layer (η~90%)
Free Layer (η~50%)
Tunneling Oxide
e e e e e e e e
T1 T1
T2 T2
Electrical Current
Electrical Current
AP→P P→AP
MgO MgO
CoFe
CoFe/Ru/CoFeB
CoFe/Ru/CoFeB
CoFe
MTJ Basics: Spin-transfer torque induced write operation
P→AP magnetization switching is relatively difficult than AP→P due to lower spin injection efficiency of the free layer
e e e e e e e e
1-T, 1 R STT-MRAM
source line (SL)
word line
(WL)
bit line (BL)
Iref
Iread
bias
generator A
P->
P
P->
AP
write
read
Key Advantages:
High Density (just 1 transistor per bit cell)
Non-Volatility
No leakage power in un-accessed cells
Limitations
Write asymmetry
Reliability-limited write speed
Read Write optimization conflict
Contact to MTJ
Transistor common
Back-end MRAM module
Front-end CMOS module
Single MRAM bit cell
Word line
Metal 4 contact stud in via stack
EMBEDDED MRAM PROCESS : Integrated Magnetic Process Cross-Section
Source: Everspin Technologies
Magnetic Stack
Reference Layer
Tunneling Barrier
Free Layer
Magnetic Pinning Layer
Protection Layer
The multilayer of an MTJ stacking is prepared using sputtering and
plasma oxidation
0 20 40 60 80 100 120 140 160 180 200 0
1
2
3
4
5
6
7
8
STT: JC=1e6 A/cm2
STT: JC=5e6 A/cm2
External Field MRAM
Magnet Width (nm)
Sw
itch
ing C
urr
ent
(mA
)
Spin Torque Switching is
superior compared to Field
Based Switching in scaled
geometries
Comparison of STT MRAM and Field Based MRAM
@Iso-Delay
Spin-torque MRAM need much lower switching current at iso-delay for scaled geometries
Bit-Cell Design using STT-MTJ
Design Challenge 1: Read and Write Failures in 1T-1R Bit-cell Structures
0 0.5 1 1.5 2 2.5 3 3.5
300
250
200
150
100
50
0
# o
f o
ccu
rre
nc
es
Write “0” to “1”
Write “1” to “0”
Write current ( Χ10-3 A )
Write “0”failure
“1” to “0”threshold
current
“0” to “1”threshold
current
Write “1”failure
0 0.5 1 1.5 2 2.5 3 3.5
300
250
200
150
100
50
0
# o
f o
ccu
rre
nc
es
Write “0” to “1”
Write “1” to “0”
Write current ( Χ10-3 A )
Write “0”failure
“1” to “0”threshold
current
“0” to “1”threshold
current
Write “1”failure
Write Failure
Read Failure
Insufficient TMR to distinguish
between RAP and RP
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Current(P)
Current(AP)
slope=1/RP
slope=1/RAP
Voltage (V)
Curr
ent
(mA
)
RP=1kΩ
RAP=2kΩTMR=100 %
0 2 4 6 8 10 12 14 16
400
300
200
100
0
# o
f o
cc
urr
en
ce
s
Read “1” current
Read “0” current
Read “1”decision failure
Read current ( Χ10-4 A )
Read “0”decision failure
Reference current
0 2 4 6 8 10 12 14 16
400
300
200
100
0
# o
f o
cc
urr
en
ce
s
Read “1” current
Read “0” current
Read “1”decision failure
Read current ( Χ10-4 A )
Read “0”decision failure
Reference current
Process variations reduce memory yield
RP I > I LH
ILH
I>IHL
IHL IREAD
- 1 0.5 0 0.5 1 0.5
1
1.5
2
2.5
Applied voltage (V)
No
rmalized
MT
J R
esis
tan
ce
Insufficient current to switch MTJ
IWRITE < min (ILH, IHL)
σ/μ=5% in MTJ Area and MgO thickness
Design Challenge 2: Read-Write Conflict
WRITE “1”
R MTJ
BL
SL
WL
R MTJ
BL
WL
WRITE “0”
Current (ILH)
Current (IHL)
R
BL
SL
WL
R
BL
WL
READ
Current (IREAD)
IREAD and IHL have same polarity Read operation can cause unintended write (Read Disturb)
-1 -0.5 0 0.5 1 0.5
1
1.5
2
2.5
Applied voltage (V)
MT
J R
esis
tan
ce (
no
rmalized
)
R AP
R P
I > I HL
I > I LH
Applied voltage (V)
MT
J R
esis
tan
ce (
no
rmalized
)
R AP
R P
I > I HL
I > I LH
0 2 4 6 8 10 12 14 16
400
300
200
100
0
# o
f o
cc
urr
en
ce
s
Read “1” current
Read “0” current
Read current ( Χ10-4 A )
“1” to “0”threshold
current
Read “1”disturbance
failure
0 2 4 6 8 10 12 14 16
400
300
200
100
0
# o
f o
cc
urr
en
ce
s
Read “1” current
Read “0” current
Read current ( Χ10-4 A )
“1” to “0”threshold
current
Read “1”disturbance
failure
Read Disturb Failure (under process variations)
σ/μ=5% in MTJ Area and MgO thickness
Design Challenge3: Stray Fields and its Impacts on MTJ Stack
0 10 20 30 40 501
1.2
1.4
1.6
1.8
HK=150 Oe
HK=75 OeHK=100 Oe
Hstray (Oe)N
orm
alize
dD
ela
y
WRITE time increases for higher stray fields write failures
MgO
Fixed Layer
Free Layer
Victim MTJ
Aggressor MTJ
Aggressor MTJ Aggressor MTJ
Aggressor MTJ
MgO
MgO
MgO
MgO
P to AP switching is opposed by
stray magnetic field (HSTRAY)
HSTRAY
Design Challenge 4: Stochastic Switching Behavior due to Thermal Fluctuations
Angular precession with different T
•Thermally induced initial magnetic oscillations are stochastic in nature (white noise)
•STT switching delay distribution has a longer tail, degrading memory yield
•Higher temperature helps in squeezing the switching delay distribution but degrades thermal stability
300 350 400 450 500 550 60010-8
-710-7
10-6
10-5
10-4
10-3
10-2
10-1
100
101
Temperature (K)
Life-t
ime (in y
ears
) Life-time decreases
exponentially with
temperature
STT-MTJ Stacks: Stability, Read/Write
Conflicts, Write power, ..
BL
WL
SL BL
WL
SL
BL
WL
SL
BL
WL
SL
IRD IWR(‘0’)
Shared Read/Write Current Paths
• Conflicting read and write current requirements
• Limited design space
SL
WL
BL
VX > 0
VGS < VDD
IWR(‘0’)
SL
WL
BL
IWR(‘1’)
Source Degeneration
• May require access transistors to be upsized
to meet IC requirements at fixed VDD
• Leads to excessive IWRITE for writing the
opposite direction
STT-MRAM Design Issues
Dual Pillar MTJ Structure: Spatial and electrical isolation of MTJ read and write
TOXIDE, 1 < TOXIDE, 2
• Built-in read and write stabilities without any mutual conflict → (relaxed memory design space)
• Lower WRITE, DISTURB and DECISION failures → (Robust operation)
• 1T-1MTJ STT-MRAM – Single transistor based data access → (Uncompromised memory density)
• Single supply voltage READ/WRITE operation → (Simplified bit-cell design)
• Single ended voltage sensing for data read-out → (Fast and reliable sensing scheme)
Advantages:
29
GND
VDD VDD GND VDD
VDD
Complementary Polarizers
STT-MRAM (CPSTT)
SL
WL RBL
WBL
Free Layer
Tunneling Oxide
Tunneling Oxide
Read Port
Pinned Layer
Write Port
Pinned Layer
Pinned Layers
WL
BL
SLL SLR
ATxL ATxR
Free
Layer
Tunneling Oxide
Dual-pillar Spin-Transfer
Torque MRAM (DPSTT)
Multi-terminal STT-MRAM Structures
[1] N. N. Mojumder, K. Roy, TED vol. 59, no. 11, pp. 3054-3060, 2012
[2] X. Fong, K. Roy, EDL vol. 34, no. 2, pp. 232-234, 2013
SL
WL RBL
WBL
IWR(‘0’)
IWR(‘1’)
• TOX2 > TOX1
• IWR(‘0’): VSL=0, VWL=VWBL=VRBL=VDD
• IWR(‘1’): VSL=VDD, VWL=VWBL=VRBL=0
TOX1
TOX2
• IWR(‘0’): VSLL=0, VWL=VBL=VSLR=VDD
• IWR(‘1’): VSLR=0, VWL=VBL=VSLL=VDD
WL
BL
SLL SLR
IWR(‘0’) IWR(‘1’)
Write Operations
Read / Sensing Operations
SL
WL RBL
WBL
IRD
• IREF= 0.5×{IRD(‘0’)+IRD(‘1’)} • VWL=VDD, VSL=VWBL=0, VRBL=VREAD
• ‘0’: IRD(‘0’) > IREF
• ‘1’: IRD(‘1’) < IREF
WL
BL
SLL SLR
•VBL=0, VWL=VDD, VSLL=VSLR=VREAD
• ‘0’ : IRDL > IRDR
• ‘1’ : IRDL < IRDR
• Differential sensing
• Self-referencing
IRDL IRDR
Peripheral Circuitry for CPSTT Array
• Differential sensing
using latch based
sense amplifier
• M2-M3 and M6-M7 act
as cross-coupled
inverters with M4-M5
and M8-M9 along
their respective pull-
down paths
• Each pull-down path
goes through a
different pinned layer
SEL
RDEN
M5 M9
WrData
SLLSLR IREAD,R
Write Driver
D Q
CLK
D Q
CLK
Data
DataB
CLK
VDD
REN
RCLKSense Amplifier
M2
M3
IREAD,L
M1
M6
M4
M7
M8
M10
M11
VPreVPre
• Since the free layer is only parallel to one of the pinned layers, the resistance
path through each pinned layer will be different
• SPICE simulations show >1.5GHz read speeds are possible with realistic parasitics
36
Read Write0
1
2
3
No
rma
lize
d L
ate
nc
y
Higher Latency
Read Write0
1
2
3
4
No
rm. A
cti
ve
En
erg
y
Higher Active Energy
Leakage
0.6
0.8
1
No
rma
lize
d P
ow
er
Lower Leakage
4X higher capacity
Leakage
0.6
0.8
1
6T SRAM
STT: Conv.
STT:TMA
Leakage
0.6
0.8
1
6T SRAM
STT: Conv.
STT:TMA
Leakage
0.6
0.8
1
6T SRAM
STT: Conv.
STT:TMA8MB STT MRAM
(Standard) 2MB 6T SRAM 8MB STT MRAM (Tilted
Magnetic Anisotropy)
[1] S. P. Park et al, DAC 2012
Comparison of 1T-1R with SRAM Last Level
Caches with Similar Cache Area
(2MB SRAM, 8MB STT MRAM)
37
Integer Floating Point0
0.5
1
1.5
SPEC 2000 Benchmarks
Ins
tru
cti
on
s p
er
Cy
cle
6T SRAM (2MB)
STT: Conv. (8MB)
STT: TMA (8MB)
Compared to SRAM cache STT MRAM caches offer
• Higher throughput due to larger integration density and lower cache miss rate
• Lower Energy due to low leakage and low utilization
Leakage Only 2% 10% 0
0.02
0.04
0.06
0.08
0.1
0.12
Cache Utilization (CU)T
ota
l E
ne
rgy C
on
su
mp
tio
n (
J)
6T SRAM (2MB)
STT: Conv. (8MB)
STT: TMA (8MB)
109 Cycles
Avg. CU = 2.2%Max. CU = 13%
Std Std
[1] S. P. Park et al, DAC 2012
System Level Implication of STT MRAM Last
Level Caches
Lateral Spin Valve Vertical spin valve structure
Local Spin torque
Ch
arg
e C
urr
en
t
Cu
Fixed Layer
Free Layer
Sp
in C
urr
en
t
Vsupply
Free Layer switches under the
influence of local spin torque
Spin Valves
injection-efficiency is determined by
interface polarization, can be ~90%
for injector polarity ~ 0.9
Ch
arg
e C
urr
en
t
NiFe NiFe
Cu μ+
μ-
Magnet1 Magnet2
Spin potential gradient
Spin Current
Charge Current
Spin Current
NiFe NiFe
Magnet1 Magnet2
local spin torque
Non-local spin torque
Non-Local STT-MRAM (Separate R/W Paths)
non-local
spin absorption
θtilt
dual injectors with tilted
axis anisotropy
read current
write current
metal
channel
MgO
free layer
fixed layer
use of dual injector
enhances spin injection
efficiency by ~2X
tilted anisotropy for
injectors further reduces
the switching current by
~3X
compensate
for low spin-
injection efficiency
for non-local STT
write ??
device structure:
M. Sharad et al., DRC, 2012
Simulation Framework
SPICE MTJ Model
a) S. Yuasa et al., Nature Materials vol. 3, no. 12, pp. 868-871, Dec. 2004.
b) C. J. Lin et al., IEDM, Dec. 2009, pp. 11.6.1-11.6.4.
c) T. Kishi et al., IEDM, Dec. 2008, pp. 12.6.1-12.6.4.
1 1.5 2 2.5 3
101
102
103
104
105
106
107
tMgO
(nm)
RA
(
-m
2)
Anti-parallel
Parallel
EF=2.25eV, E
B=0.865eV, =0.315eV,
mFM
=0.748, mOX
=0.462, a=3e-10,
T=20K, VMTJ
=10mV
-0.8-0.4 0 0.4 0.85
10
15
20
VMTJ
(V)
RA
MT
J (
-m
2)
-0.8 -0.4 0 0.4 0.85
10
15
20
25
30
VMTJ
(V)
RA
MT
J (
-m
2)
NEGF (AP)
NEGF (P)
Data (AP)
Data (P)
-0.8 -0.4 0 0.4 0.85
10
15
20
25
30
VMTJ
(V)
RA
MT
J (
-m
2)
NEGF (AP)
NEGF (P)
Data (AP)
Data (P)(a) (b)
0 5 10
10
20
30
Time (ns)
RM
TJ
(k
)
Standard
Reversed
0 5 10
10
20
30
Time (ns)
RM
TJ (
k
)
Standard
Reversed(c)
• Device level simulation results may be
calibrated to experimental data
• Device parameters are imported into SPICE
model for circuit level simulations
Bit-Cell Configuration
Design Solutions: Circuit and Architecture Optimization
RMTJ
Bitline
(BL)
Wordline (WL)
Sourceline
(SL)
RMTJ
Bitline
(BL)
Wordline (WL)
Sourceline
(SL)
Sizing of driver transistor
RMTJ
BL
Read-Wordline
(WL_r)
SL
Read- NMOS Write-NMOS
Write-Wordline
(WL_w)
RMTJ
BL
Read-Wordline
(WL_r)
SL
Read- NMOS Write-NMOS
Write-Wordline
(WL_w)
Sizing of read and write transistors
Circuit Optimization
Stretched Write Cycle (SWC) based Architecture Optimize bit-cell for read Read : One cycle Write : Two cycle
Circuit and Architecture Optimization
driver transistor
RMTJ
Bitline
(BL)
Wordline (WL)
Sourceline
(SL)
RMTJ
Bitline
(BL)
Wordline (WL)
Sourceline
(SL)
Bit-Cell Design: Summary
0.95
1
1.05
1.1
1.15
1.2
1.25
1.0
Bit–
Cell
Write
Energ
y (
Norm
aliz
ed
wrt
1T
-1R
without variations)
1T-1R
(w/o variations)
1T-1R worst case design
(under variations)
1T-1R optimized
(under variations) area overhead (5.4%)
2T-1R optimized
(under variations)
area overhead (9%)
1T-1R with SWC
(under variations)
throughput reduction (3%)
SWC provide lowest energy for iso-failure probability with minor throughput degradation
Acknowledgement
46
Swarup Bhunia, Case Western U.
Anand Raghunathan, Purdue
Niladri Mojumder, GF
Charles Augustine, Intel
Mrigank Sharad, Purdue
Delian Fang, Purdue
Xuanyao Fong, Purdue
Sumeet Gupta, Purdue
Harsha Choday, Purdue
Sang-Phill Park, Intel
Prof. Supriyo Datta
StarNet, NRI, INDEX, SRC, Intel, NSF, Qualcomm