Upload
victoria-harrison
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1ITRS ERD July 2008
Hybrid CMOS / Nanoelectronic Circuits(CMOL, FPNI, 3D CMOL, etc.)
Hybrid CMOS / Nanoelectronic Circuits(CMOL, FPNI, 3D CMOL, etc.)
Konstantin K. LikharevStony Brook University
Acknowledgments:
- useful discussions:
P. Adams, P. Allen, J. Barhen, S. Das, A. DeHon, P. Franzon, D. Hammerstrom, R. Karri, R. Kiehl, P. Kuekes, J. H. Lee, J. Li, X. Liu, J. Lukens, X. Ma, A. Mayr, C. A. Moritz, V. Patel, D. Resnick, N. Simonian, G. Snider, S. V. Sreenivasan, M. Stan, D. Stewart, D. Strukov, Z. Tan, W. Wang, R. Waser, R. S. Williams, T. Zhang
- financial support:
AFOSR, DOD, FCRP (via FENA Center), NSF Literature: see the White Paper;
more online: rsfq1.physics.sunysb.edu/~likharev/nano/
ITRS ERD July 2008 2
CMOS/NANO HYBRIDS: THE IDEA
bottom nanowire level
top nanowire
level
similar two-terminalnanodevices
at each crosspoint
Historic (first?) version:
(J. Heath, P. Kuekes, G. Snider, R. S. Williams 1998)
add-on
CMOSstack
Basic current version:
V
I ON state
ON state
0
+Vt
OFF state
V+V-
-Vt
ITRS ERD July 2008 3
RELATION TO DEVICE TECHNOLOGIES DISCUSSED TODAY
Nanoscale Devices Applicability to CMOL/FPNI
NEMS switches 2-terminal switches OK
Spin torque transfer devices STT junctions may be OK
Carbon-based devices n/a
Atomic / electrochemical metal switches OK
Collective spin devices n/a
Single electron devices latching switch OK
___________________________________________________________
+ Phase-change cells
+ Organic layers (with and w/o embedded clusters)
ITRS ERD July 2008 4
BISTABLE TWO-TERMINAL DEVICES(a.k.a. latching switches, a.k.a. programmable diodes)
DC I-V curve (schematically): Several material options demonstrated:- polymers (with or w/o clusters)- metal oxides (or sulfides)- solid electrolytes- amorphous silicon- chalcogenides
- molecular SAMsV
I state 1
state 1
0 1
1 00
+Vt
state 0
V+V-
-Vt
A few examples:
Y.-S. Lai et al. (2005)Poly(N-vinylcarbazole)
L. Bolzano et al. (2004)Aluminum tris(8-hydroquinoline)
R. T. Weitz et al. (2006)Copper-2,3-dichloro-5,6-dicyano-p-benzoquinone
ITRS ERD July 2008 5
METAL-OXIDE LATCHING SWITCHES
“Memory effects” in oxideshave been known for a while:
G. Dearnaley et al., Rev. Prog. Phys. (1970):a review with 150+ references
Just a few recent references: metal:
S. Seo et al., APL (2003) NiB. J. Choi et al., JAP (2005) TiH. Sim et al., Microel. Eng. (2005) NbD. Lee et al., EDL (2005) ZrA. Chen et al., IEDM’05 CuM. Kund et al., IEDM’05 AgD. C. Kim et al., APL (2006) NbN. Banno et al., IEICE TE (2006) Cu(S)T.-N. Fang et al., ICMTD’07 CuL. Courtade et al., ICMTD’07 NiW. Guan et al., APL (2007) ZrS.-W. Kim & Y. Nishi, NVMTS’07 Cu(S)D. Stewart, NVMTS’07 TiK.-C. Liu et al., NVMTS’07 HfD. Lee et al., APL (2007) Mo
Percentage
2.28%
Current (uA)0.005 0.01 0.05 0.1 0.5 1 5 10
-2
-1
0
1
2
15.9%
50.0%
84.1%
97.7%
Percentage
2.28%
Current (uA)0.005 0.01 0.05 0.1 0.5 1 5 10
-2
-1
0
1
2
15.9%
50.0%
84.1%
97.7%
ONOFF
With time, data are becoming more reproducible:
A. Chen et al. (IEDM’05)
Just a few examples:
B. J. Choi et al. (2005) D. Lee et al. (2007)
-2 0 2 4-20.0µ
-10.0µ
0.0
10.0µ
20.0µ
30.0µ
40.0µ
50.0µ
Cur
rent
(A)
Voltage (V)
I l im set by transistor Vg
OFF-stateVTFL
2
2LeNV t
TFL
ON-state: SCLC with shallow traps
-2 0 2 4-20.0µ
-10.0µ
0.0
10.0µ
20.0µ
30.0µ
40.0µ
50.0µ
Cur
rent
(A)
Voltage (V)
I l im set by transistor Vg
OFF-stateVTFL
2
2LeNV t
TFL
ON-state: SCLC with shallow traps
A. Chen et al. (2005)
Cu
Cu2O
TE
ITRS ERD July 2008 6
RECENT RESULTS:Si /α-Si / M JUNCTIONS
Y. Dong et al., 2008 S. H. Jo and W. Lu, 2008
ITRS ERD July 2008 7
NANOWIRE CROSSBARS
crossbar with Fnano = 15 nm
J. Green et al. (2007)[Caltech + UCLA]
G.-Y. Jung et al. (2006)[HPL + Caltech]
W. Wu et al. (2005)[HPL]
ITRS ERD July 2008 8
ADVANCED LITHORGAPHIES
crossbar with Fnano = 15 nm
J. Green et al. (2007)
www.zeiss.com IMPRIO 1100 from Molecular Imprints,
Inc. (“sub-50nm”)
Nanoimprint EUV IL Block-copolymer
B. Wua and A. Kumar (2007)
www.almaden.ibm.com
ITRS ERD July 2008 9
“CMOL” INTERFACE CONCEPT (I)
CMOSstack(just a
cartoon)
interfacepins
goldnanowire
levels(nanoimprint)
MOSFET
nanodevices(latching switches)
interfacevia (“pin”)
Si wafer
K. L. (2004)
ITRS ERD July 2008 10
Tip radii 2-10 nm
http://www.oxfordplasma.de/ process/sibo_wtc.htm
http://my.ece.ucsb.edu/mishra/vacuummicroelec/progressb.0157.htm
SILICON PIN ARRAYS(developed mostly for field emission)
Main challenge:
Move to the back end of the CMOS process flow (metals?)
ITRS ERD July 2008 11
(a)
(b)
(c)
(d)
(e)
(f)
POSSIBLE CMOL FABRICATION FLOW
K. L. (2007a)
ITRS ERD July 2008 12
CMOL INTERFACE CONCEPT (II)
Most important feature: pin array tilt by angle = arcsin(Fnano/FCMOS) = arctan(1/r)
Every nanowire (and hence every
crosspoint) may be addressed
from CMOS!
2FCMOS
pin 2A
2Fnano
pin 1
pin 2B
2rFnano
K. L. (2004, 2005); D. Strukov and K. L. (2006)
A B
ITRS ERD July 2008 13
CMOL: YIELD WITHOUT ALIGNMENT
Shift along the top level:
fine fine bad? bad! fine fine
Shift along the bottom level:
fine fine bad! fine fine fine
Theoretical yield maximum: 100%
K. L. (2007)
ITRS ERD July 2008 14
data Acol1
select
select
Arow1
Arow2a
select
Arow2b
data (r2 lines)
barrel shifterAcol2
RESISTIVE MEMORIES:ARCHITECTURE
D. Strukov and K. L. (2007a)
cell addressesblock rowaddress
dataI/O
block address decoder
ECC unit
block block block
block
block
block block
block block
select decoder
data I/O
external address
memory cell array
selectdecoder
address control
mapping table
data decoder
data decoder
Acol1
Acol2
Arow2
Arow1
data
Arow2
select
Arow1
select
data Acol1
Rpd
CMOL array data lines
barrel shifter
data in/out
Acol2
Top-level structure:
Limited data granularity:
CMOS cell:CMOL block:
Barrel shift decoder:
ITRS ERD July 2008 15
10-5 10-4 10-3 10-2 10-1 10010-2
10-1
100
101
FCMOS
/Fnano
=10
Ideal CMOS
Are
a p
er
use
ful b
it, a
= A
/N(F
CM
OS)2
Fraction of bad nanodevices, q
Access time (ns) 3 10 30 100
Ideal CMOL
RESISTIVE MEMORIES:EVALUATION RESULTS
D. Strukov and K. L. (2007a)
Bottom line: - density up to 1 Tb/cm2 feasible (see below) - speed, power OK - defect tolerance acceptable (~10%)
Final results example: density and defect tolerance
10
25 27 29 211 213 21510-5
10-4
10-3
10-2
10-1
100
CMOL array linear size, W
Are
a p
er
use
ful b
it, a
= A
/N(F
CM
OS)2
Total Mapping table Cell decoder Control circuitry Redundant memory cells Useful memory cells
n = 255, k = 147, FCMOS
/Fnano
=10,
g = 8, q = 0.01, Y = 0.9
buffer
sense amplifier
senseROFF
wire
RR
D
ONR
D
3wirer R
Equivalent circuit for readout delay calculation:
Total chip area optimization:
16ITRS ERD July 2008
RECONFIGURABLE LOGIC CIRCUITS
(a) 2βFCMOS 2βFCMOS2(r - 1) α
Generic CMOL fabric
D. Strukov and K.L. (2005)
nano nano( 1)cos , sin ,
CMOS CMOS
rF r F
F F
(b)
output nanowire
inputnanowire
CMOS column 2
CMOSrow 1
CMOSinverter
CMOSrow 2
VDD
B
A
FF
A B
CMOS inverter
nanodevices
passtransistor
AB
F
(c)
RON
RpassCwire
CMOScolumn 1
17ITRS ERD July 2008
CMOL FPGA CIRCUIT: EXAMPLE
D. Strukov and K.L. (2005)
0
ai bi
(gi0, pi
0)
(gil, pi
l) (gjl, pj
l)
(gil+1, pi
l+1)
(gil, pi
l) pi0 ci
si
gil
ci=gil+1
31 INPUT
OUTPUT
32-bit Kogge-Stone adder…
(a)
(b)
a0b0a1
b1a30b30a31
b31
s0s1s30s31
…mapped on the CMOL fabric…
…before…
…and after reconfiguration(@ 50% of bad devices)
18ITRS ERD July 2008
CMOL FPGA: RESULTS (I)defect tolerance.. ..and performance
D. Strukov and K.L. (2005)
Bottom line:yield >99% for 22-25% (!)
of bad devices
FCMOS = 45 nm32 nm22 nm
FCMOS= 32 nm, Fnano= 9 nm: A 110 m2, 0.9 ns
CMOS FPGA with the same FCMOS: A 70,000 m2, 1.7 ns
0.01 0.1 10
20
40
60
80
100
1E-5 1E-4 1E-3 0.01 0.1 190
99
1E-3
99.99
0.01 0.1 10
20
40
60
80
100
Bad Nanodevice Fraction q
r'=17, r = 17
r'=10, r = 10 11 12 13
Circ
uit Y
ield
Y (%
)
Bad Nanodevice Fraction q
r'=10 r=12
r'=10, r=10
crossbar adder
99.9
Circ
uit Y
ield
Y (%
)
r'=17, r= 17 18 19
r'=10, r= 10 11 12
Circ
uit Y
ield
Y (%
)
Bad Nanodevice Fraction q
19ITRS ERD July 2008
CMOL FPGA CAD 1.0
D. Strukov and K.L. (2006a)
Design flow:
First goal: Toronto 20 benchmark circuit set
Latched CMOL fabric: tile boundarylatch cellbasic cell
2Fnano
2aFnano
SIS: Technology (NOR gate and latch) mapping
Input circuit blif format
Initial value of N
Heuristic placement
Global router
Exit with success
Increase N
countmax < T-N -∆countmax > T-N
N = 0
Circuit pre-processing
Defective cells
Decrease N
otherwise
Exit without success
in outCMOS latch
4FCMOS
Latchcell:
ITRS ERD July 2008 20
Circuit CMOS FPGA (FCMOS = 45 nm) CMOL FPGA (FCMOS= 45 nm, Fnano= 4.5 nm, max fan-in = 7) Comparison
Depth LUTs Array size (clusters)
Area (μm2)
Delay (ns)
Depth CMOS cells
Array size (clusters)
N Nano-devices
Area (μm2)
Delay (ns)
ACMOS
/ACMOL
AnanoPLA
/ACMOL
alu4 7 1274 19×19 137700 5.1 23 1854 22×22 5 9788 1004 4.0 137 0.28
apex2 8 1602 21×21 166050 6.0 26 1928 21×21 6 11365 914 4.6 182 3.09
apex4 6 1147 34×34 414619 5.5 19 1176 18×18 6 7781 672 3.6 617 0.58
bigkey 3 1810 22×22 193388 3.1 20 2065 20×20 6 10207 829 2.7 233 1.82
clma 16 6779 42×42 623194 13.1 75 7585 67×67 2 48746 9308 10.2 67 1.74
des 6 1263 19×19 148331 4.2 28 2321 23×23 6 12610 1097 4.5 135 3.21
diffeq 14 987 16×16 100238 6.0 73 2004 24×24 6 10799 1194 10.4 84 2.27
dsip 3 1362 19×19 148331 3.2 26 1615 20×20 7 9905 829 3.4 179 1.63
elliptic 18 2142 24×24 213638 8.6 81 4799 47×47 4 25415 4581 12.7 47 1.63
ex1010 8 4050 33×33 391331 9.0 43 2986 41×41 3 28746 3486 5.7 112 0.28
ex5p 7 950 16×16 100238 5.1 27 902 20×20 4 6875 829 4.3 121 0.19
frisc 23 2320 25×25 230850 11.3 114 4715 45×45 4 25869 4199 17.6 55 2.64
misex3 7 1178 18×18 124538 5.3 24 1397 22×22 4 9211 1004 3.6 124 0.56
pdc 9 3901 32×32 369056 9.6 54 4752 49×49 2 14841 4979 6.8 74 0.15
s298 15 1682 21×21 166050 10.7 45 1030 20×20 4 10161 829 8.1 200 1.33
s38417 11 4773 36×36 462713 7.3 52 8289 67×67 3 53156 9308 7.2 50 1.24
s38584 9 4422 35×35 438413 4.8 64 6502 69×69 3 50275 9872 8.8 44 -
seq 7 1427 20×20 151369 5.4 23 1832 25×25 4 11027 1296 4.0 117 1.15
spla 8 3331 30×30 326025 7.3 40 4240 38×38 3 24808 2994 5.8 109 0.12
tseng 13 781 14×14 78469 6.3 75 1866 24×24 6 4918 1194 11.5 66 2.48
CMOL FPGA: RESULTS (II)Toronto 20 benchmark circuit set
D. Strukov and K.L. (2006a)
ITRS ERD July 2008 21
2009 2010 2011 2012 2013 Comments
Half-pitch FCMOS (nm) 50 45 40 36 32 Follows ITRS until 2013
Half-pitch Fnano (nm) 20 18 16 14 12 Mostly nanoimprint
Device density n (Giga/cm2) 63 77 98 128 174 Grows fast
Parameter r 10 10 10 11 11
Connectivity N 100 100 100 121 121 Barely change
Interface rotation angle () 5.7 5.7 5.7 5.2 5.2
Nanowire segment L (μm) 4.0 3.6 3.2 2.8 3.0 Decreases slowly
Defect fraction q (%) 20 15 10 7 5 Improves fast
DIGITAL CMOL: PROSPECTS (I)
2016 2019 2022 2025 2028 Comments
Half-pitch FCMOS (nm) 30 28 26 24 22 Decreases very slowly
Half-pitch Fnano (nm) 10 6 4 3.5 3 EUV? Block-copolymers?
Device density n (Tera/cm2) 0.25 0.70 1.0 2.0 2.8 Unprecedented density reached
Parameter r 12 19 26 28 30 Increases substantially
Connectivity N 144 361 676 784 900 Increases fast
Interface rotation angle () 4.8 3.0 2.2 2.0 1.9 Decreases
Nanowire segment L (μm) 2.8 4.3 5.4 5.4 5.4 Increases slowly
Defect fraction q (%) 3 1 0.3 0.1 0.03 Improves slower
K. L. and D. Strukov (2007)
ITRS ERD July 2008 22
DIGITAL CMOL: PROSPECTS (II)
Metrics (units) 2009 2010 2011 2012 2013 Comments
Half-pitch FCMOS (nm) 50 45 40 36 32 In accordance with ITRS
Half-pitch Fnano (nm) 20 18 16 14 12 -
CMOS memories (Gbits/cm2) 6.7 8.2 10.5 13 16 Follows ITRS (with A = 6F2CMOS)
CMOL memories (Gbits/cm2) 4 10 23 36 67 Initial progress impacted by q
CMOS FPGA (Mgates/cm2) 0.4 0.5 0.6 0.8 1.0 Rescaled from 0.18 μm rules
CMOL FPGA (Mgates/cm2) 625 775 1,000 1,200 1,500 -
Metrics (units) 2016 2019 2022 2025 2028 Comments
Half-pitch FCMOS (nm) 30 28 26 24 22 Grows slower than in ITRS
Half-pitch Fnano (nm) 10 6 4 3.5 3 -
CMOS memories (Gbits/cm2) 18 21 25 29 35 Follows A = 6F2CMOS
CMOL memories (Gbits/cm2) 100 350 900 1,200 1,700 Spectacular progress at lower q
CMOS FPGA (Mgates/cm2) 1.1 1.3 1.5 1.7 2.1 Rescaled from 0.18 μm rules
CMOL FPGA (Mgates/cm2) 1,700 2,000 2,300 2,700 3,200 -
K. L. and D. Strukov (2007)
ITRS ERD July 2008 23
SAMPLE DSP TASK: CONVOLUTION(e.g., for FPA image processing)
Parameters selected for our estimates:N = 1,024F = 32 (i. e. << N)Accuracy: nS = n = 12
Demands to hardware:Add-multiplies: F2N2 109 per frameCMOS μ-processor: ~ 100 ms per frame
1,0,,
1
0
1
0,,
FNyxST ji
F
i
F
jjyixyx
N
N
F
F
Splane(input)
Tpixel
(output)
-window
D. Strukov and K. L. (2007b)
ITRS ERD July 2008 24
TWO NEW CELLS:
Control cell: New (programmable) latch:
in out
clk
IW
IN
IE
IS
OE
OW
OS
ONclk
CW
~CW
~CE
CE
CS~CS
CN~CN
added CMOS line for
control logic
output pin
input pin(not used)
gnd
Vdd
8 FCMOS
Footprint: (3×8 FCMOS)2
ITRS ERD July 2008 25
S 12 bits outoutout
out out outinininT
32 bits
ininin
φ (12 bits)
32-bit Kogge Stone Adder
1
0cT
12-bit Wallace Tree Multiplier (Partial Product Generation and Reduction)
out out outininin
M24 bits
1 20cM
32 bits
24 bits
12 bits
cA cB0 1
1 0
0
multiplieraddermultiplexerother
Control cell Programmable latch cell
used
not used
usednot used
Basic cell
not used
ARCHITECTURE AND PERFORMANCE
Calculated performance for N = 1,024, F = 32, n = 12 bits: ~ 25 μs, vs. ~3,500 μs for CMOS(per frame)
D. Strukov and K. L. (2007b)
26ITRS ERD July 2008
NEUROMORPHIC NETWORKS (“CROSSNETS”)
wjk = {-1, 0, +1}
Generic structure of a feedforward CrossNetS. Fölling et al. (2001)O. Turel et al. (2004)
Basic idea: CMOS “somas” + nanowire “axons” and “dendrites” + nanodevice “synapses”
somaj
somak
jk+
jk-
+
+-
-
j iwij
j
jiji ywx
27ITRS ERD July 2008
(@ 33 nanodevices per synapse, Fnano = 3 nm, connectivity 104):
Synapse footprint: ~ 500 nm2
Synapse density: ~ 21011 cm-2 (> 1012 cm-2 bits/cm2)
Neural cell density: ~ 5107 cm-2
(cf. 1.5107 cm-2 in bio)
Intercell latency: ~ 20 ns @ 100 W/cm2 (R ~ 1010 )
or: ~ 2,000 ns @ 1 W/cm2 (R ~ 1012 )
(cf. ~10 ms in bio)
CMOL is the first hardware capable of challenging human’s cerebral cortex
CROSSNETS: PERFORMANCE ESTIMATES
Ö. Türel et al. (2004)
ITRS ERD July 2008 28
From: T. Hynton (DARPA), March 2008
ITRS ERD July 2008 29
ANOTHER CMOL SPECIES: HPL’s FPNI
G. Snider and R. S. Williams (2007)
ITRS ERD July 2008 30
WEI WANG’s “3D CMOL”
D. Tu et al. (2007)
Features:- simpler interface pins- twice smaller area
ITRS ERD July 2008 31
SINGLE-ELECTRON LATCHING SWITCH:
POSSIBLE MOLECULAR IMPLEMENTATION
N
O
O
N
O
O
R
NN
O
O
O
O
R
R
N
R
C
R
R
N
R
R
C
O
R
N
R
C
R
R
O O
O
R = hexyl
naphthalenediimide group as a single-electron transistor island
perylenediimide group as a single-electron trap island
Andreas Mayr (SBU) in:
K. Likharev et al. (2003)
(C6H13-)
OPE bridges as tunnel junctions
non-conducting support group
isocyanide attachment
group
ITRS ERD July 2008 32
theoretical result… …and experiment
NDR IN MOLECULAR-SCALE SETs
N. Simonian, J. Li, and K. L. (2007) S. Khondaker et al. (2004)
NDR effect: unexpected, but in the hindsight, natural:
)]})([2{
2exp(, 2/1
)()(
)()(
dxxUmeIds
ds
Vsd = 0: eVsd ~ U:
i.e. current is determined by the highest barrier (giving the lowest tunneling rate)
one barrier suppressed, another enhanced current drops!
NDR!
ITRS ERD July 2008 33
NDR CMOL CONCEPT
Goto pair
clock
t0
t0
t0
t0
phase 1
phase 2
phase 3
phase 4
REF
EVAL
OFF
OFF
Four-phase clocking
Logic gates
V
I
0 VDDVt
VCLK
VGND
upper layer nanowire
lower layernanowire
VCLK2
Vgnd
VBIASVOUTVCLK1
Vgnd
VCLK1
Vgnd
VIN2
VIN1 3
1
2
REF EVAL
3
1
2
0 0.2 0.4 0.6 0.8 1-1.5
-1
-0.5
0
0.5
1
1.5x 10
-5
0 0.1
0.2
0.3
0.4
0.5
0.6 0.7
0.8
0.9
1
Vout
I
Latching to Boolean “1”
D. Strukov and K.L. (2007c)
ITRS ERD July 2008 34
NDR CMOL (preliminary results)
ProsDifferent logic gates Small cell area (×3↑) Nanoscale latch (×1.5↑)
ConsLow fan-in Dual rail logic (×2↓)Low fan-out (×1.15↓) Pipeline buffers (×1.25↓)
Circuit
CMOL FPGA CMOL NDR
Cells Area
(μm2) Cells
Area (μm2)
alu4 3902 749 9384 389
apex2 4447 830 11516 478
apex4 3027 531 7525 312
bigkey 4725 672 14334 594
clma 26859 6272 86292 3579
des 5422 1004 12872 534
diffeq 5540 830 12524 519
dsip 3975 600 13174 546
elliptic 17443 2399 29958 1242
ex1010 9862 1745 32080 1330
ex5p 3036 531 7128 296
frisc 17646 2542 28758 1193
misex3 3283 600 9192 381
pdc 19260 3488 50532 2096
s298 2552 467 14633 607
s38417 41576 6277 80862 3354
s38584 29009 4202 59222 2456
seq 4902 915 11832 491
spla 16764 2996 44622 1851
tseng 5911 830 10164 422
Toronto 20 benchmark set
Early summary (compared to CMOL
FPGA)
Area Comparable
Delay Comparable
Defect tolerance Comparable
Data throughput Much better
Better architecture possible?
D. Strukov and K.L. (2007c)
ITRS ERD July 2008 35
CONCLUSIONS
CMOS/Nano Hybrids:- possibly, the only way to go beyond the conventional
lithography limits- realistic components, demos above ~15 nm
Possible Impact: - extending Moore’s Law for 10 to 15 years beyond ITRS 32 nm point- eventually, first challenge to the cerebral cortex
Hardware Issues: - Fnano > 10 nm: integration / reproducibility- Fnano < 10 nm: everything:
- devices (SAM?)- patterning EUV IL? block-copolymer?- back-end-compatible pins
Software Issues: - ASIC performance (vs CMOS)- tolerance to various defects- advanced information processing tasks and methods
need better CAD tools