Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Integrated Circuit Design with Integrated Circuit Design with
NEM RelaysNEM Relays
Fred ChenFred Chen22, Hei Kam, Hei Kam11, Dejan Markovic, Dejan Markovic33, ,
TsuTsu--Jae KingJae King LiuLiu11, Vladimir Stojanovic, Vladimir Stojanovic22, Elad Alon, Elad Alon11
11Department of EECS, University of California, Berkeley, CADepartment of EECS, University of California, Berkeley, CA22Department of EECS, Massachusetts Institute of Technology, Cambridge, MADepartment of EECS, Massachusetts Institute of Technology, Cambridge, MA
33EE Department, University of California, Los Angeles, CAEE Department, University of California, Los Angeles, CA
10.110.1
100
101
102
103
104
0
20
40
60
80
100
No
rma
lize
d E
ne
rgy
/op
1/throughput (ps/op)
CMOS is Scaling, Power Density is NotCMOS is Scaling, Power Density is Not
VVdddd and and VVtt not scaling well not scaling well power/area not scalingpower/area not scaling
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 22
400480088080
8085
8086
286386
486Pentium® proc
P6
1
10
100
1000
10000
1970 1980 1990 2000 2010Year
Po
we
r D
en
sit
y (
W/cm
2)
Hot Plate
Nuclear Reactor
Rocket Nozzle
S. Borkar
Sun’s Surface
Power Density Prediction circa 2000
10.110.1
CMOS is Scaling, Power Density is NotCMOS is Scaling, Power Density is Not
VVdddd and and VVtt not scaling well not scaling well power/area not scalingpower/area not scaling
Parallelism to improve performance within power budgetParallelism to improve performance within power budget
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 33
100
101
102
103
104
0
20
40
60
80
100
No
rmalized
En
erg
y/o
p
1/throughput (ps/op)
Operate at a
lower energy
point
Run in parallel to
recoup performance
400480088080
8085
8086
286386
486Pentium® proc
P6
1
10
100
1000
10000
1970 1980 1990 2000 2010Year
Po
we
r D
en
sit
y (
W/cm
2)
Hot Plate
Nuclear Reactor
Rocket Nozzle
S. Borkar
Sun’s Surface
Power Density Prediction circa 2000
Core 2
10.110.1
101
102
103
104
105
0
20
40
60
80
100
No
rma
lize
d E
ne
rgy
/op
1/throughput (ps/op)
Where Parallelism Doesn’t HelpWhere Parallelism Doesn’t Help
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 44
CMOS circuits have wellCMOS circuits have well--defined minimum energydefined minimum energy
Caused by leakage and finite sub-threshold swing
Need to balance leakage and active energy
Limits energy-efficiency, regardless how slow the circuit runs
Energy/op vs. Vdd
0.1 0.2 0.3 0.4 0.5
5
10
15
20
25
No
rmalized
En
erg
y/c
ycle
Vdd (V)
Etotal
Edynamic
Eleak
Energy/op vs. 1/throughput
Scale Vdd & VT:
10.110.1
What if There Was No Leakage?What if There Was No Leakage?
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 55
No longer need to balance increasing component of No longer need to balance increasing component of energy as energy as VddVdd decreasesdecreases
Relays offer near infinite subRelays offer near infinite sub--threshold slope and threshold slope and no no
leakage currentleakage current
0.0
0.2
0.4
0.6
0.8
1.0
1.2
34 35 36 37
Measured relay I-V
VGB (V)
I DS
(mA
) compliance limit
VDS = 1V
W x L = 2mm x 20mm
H = g = 0.6mm
Normalized Energy/cycle in CMOS
0.1 0.2 0.3 0.4 0.5
5
10
15
20
25
No
rma
lize
d E
ne
rgy
/cyc
le
Vdd (V)
Etotal
Edynamic
10.110.1
OutlineOutline
If we could If we could reliably reliably build relays, would circuits build relays, would circuits
implemented using relays offer implemented using relays offer superior superior
energyenergy--efficiencyefficiency to CMOS?to CMOS?
Compact Circuit Model for NEM RelaysCompact Circuit Model for NEM Relays
Fast yet key functionality captured
Circuit Design Paradigms for NEM RelaysCircuit Design Paradigms for NEM Relays
How to deal with ―slow‖ relays
Compare vs. CMOSCompare vs. CMOS
Digital Logic: 32-bit adder
Mixed Signal I/O: DAC, ADC
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 66
10.110.1 ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 77
Plan-View
Schematic Cross-Section
L
W
Body Electrode(under body
dielectric)
DrainSource
Channel
Contact Dimples
Electrical
Gate
Anchor
NEM Relay Structure & OperationNEM Relay Structure & Operation
Metal Channel Gate DielectricElectrical Gate
Source Drain
Bodyody
Body Dielectric
H
HSource Drain
Body
Body Dielectric
t gap
Htdimple
W
Air gap
Infinite off-state resistance Zero off-state leakage
Open Relay (“OFF”):
|Vgb| < Vpo (pull-out voltage)
Closed Relay (“ON”):
|Vgb| > Vpi (pull-in voltage)
Relatively low on-state resistance (100Ω < Ron < 1kΩ @ 90nm)
10.110.1
NEM Relay ModelNEM Relay Model
Compact Compact VerilogVerilog--A model for circuit simulation:A model for circuit simulation:
Mechanical dynamics: spring (k), damper (b), mass (m)
Electrical parasitics: non-linear gate-body (Cgb), gate-channel
(Cgc), and source/drain-body cap (Csd_b), contact resistance (Rcs,d)
VVpipi and delay verified against device measurementsand delay verified against device measurements
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 88
0.5Rc
RsdRsdCgb
0.5Rc
RcdRcs
DS
Cgc
So DoCsd_bCsd_b
B
Go
C
Cs Cd
Anchor
Spring k Damper b
Mechanical Model
Mass m
10.110.1
NEM Relay ScalingNEM Relay Scaling
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 99
Constant EConstant E--field scaling has similar benefits as CMOSfield scaling has similar benefits as CMOS
Pull-in Voltage with Beam Scaling
Measured pullMeasured pull--in voltages scale linearlyin voltages scale linearly
If we can overcome reliability issues & surface forces scale:
{W,L,tgap} = {90nm,2.3um,10nm} Vpi = 200mV
Mechanical delay also scales linearly Mechanical delay also scales linearly (~10ns @90nm)(~10ns @90nm)
10.110.1
NEM Relay as a Logic ElementNEM Relay as a Logic Element
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 1010
44--terminal design mimics MOSFET operationterminal design mimics MOSFET operation Electrostatic actuation is Electrostatic actuation is ambipolarambipolar
NonNon--inverting logic possible: actuation independent of inverting logic possible: actuation independent of drain/source voltagesdrain/source voltages
Mechanical delay (~10ns) >> electrical Mechanical delay (~10ns) >> electrical tt (~1ps)(~1ps) Can stack 200 devices (with 1kW on-resistance) before
electrical delay is comparable to mechanical delay
10.110.1
Digital Circuit Design with RelaysDigital Circuit Design with Relays
CMOS: delay set by electrical time constantCMOS: delay set by electrical time constant
Quadratic delay penalty for stacking devices (pass transistor logic)
Buffer & distribute logical/electrical effort over many stages
Relays: delay dominated by mechanical movementRelays: delay dominated by mechanical movement
Want all relays to switch simultaneously
Implement logic as a single complex gate
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 1111
10.110.1
Digital Circuit Design with RelaysDigital Circuit Design with Relays
Delay Comparison vs. CMOSDelay Comparison vs. CMOS
Single mechanical delay vs. several electrical gate delays
For reasonable loads, relay delay unaffected by fan-out/fan-in
Area Comparison vs. CMOSArea Comparison vs. CMOS
Larger individual devices
Fewer devices needed to implement the same logic function
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 1212
4 gate delays 1 mechanical delay
10.110.1
101
102
103
104
105
0
20
40
60
80
100
No
rma
lize
d E
ne
rgy
/op
1/throughput (ps/op)
CMOS vs. Relays: Digital LogicCMOS vs. Relays: Digital Logic
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 1313
Most processor components exhibit the energy vs. Most processor components exhibit the energy vs.
performance tradeoff of static CMOSperformance tradeoff of static CMOS
Control, Control, DatapathDatapath, Clock, Clock
Adder energy performance tradeoff is representativeAdder energy performance tradeoff is representative
10.110.1
RelayRelay--based Adderbased Adder
Full adder cell:Full adder cell:
12 relays vs. 24
transistors
XOR for free
Use fully complementary
signals to avoid
additional mechanical
delay (to invert)
Relays all sized Relays all sized
minimallyminimally
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 1414
10.110.1
NN--bit Relaybit Relay--based Adderbased Adder
Ripple carry Ripple carry
configurationconfiguration
Cascade full adder Cascade full adder
cells to create larger cells to create larger
complex gatecomplex gate
Stack of N relaysStack of N relays——
still 1 mechanical still 1 mechanical
delaydelay
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 1515
10.110.1
Snapshot: NEM Relays vs. CMOSSnapshot: NEM Relays vs. CMOS
32-bit adder CMOS Relay
Supply
Voltage
0.5 V 0.32 - 0.9 V
Load Cap
per Output
25 fF 25 fF
Total Gate
Cap
4.0 pF 125 fF
Area 600 mm2 480 mm2
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 1616
Compare vs. Compare vs. SklanskySklansky CMOS adderCMOS adder11
For similar area: For similar area:
>9x lower energy/op, >10x greater delay>9x lower energy/op, >10x greater delay
1D. Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp. on Computer Arithmetic (ARITH'07).
9x
10x
Energy/op vs. Delay/op across Vdd
Energy is for adder only
10.110.1
EnergyEnergy--Delay ResultsDelay Results
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 1717
Energy/op vs. Delay/op across Vdd & CL
30x capacitance gain30x capacitance gain
Lower device Cg, Cd
Fewer devices
2.4x 2.4x VddVdd gaingain
No leakage energy
Relays always provide a more energy efficient solutionRelays always provide a more energy efficient solution
> 10x better, even in the GOPS range
10.110.1
Area vs. Throughput TradeoffArea vs. Throughput Tradeoff
For high throughputs, relays require area overhead to For high throughputs, relays require area overhead to
achieve similar throughputs as CMOSachieve similar throughputs as CMOS
Area overhead is bounded (max. 6xArea overhead is bounded (max. 6x--25x) : 25x) :
To maintain minimum energy/op for throughputs > 1GOPS,
CMOS needs parallelism too
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 1818
0.5 1 1.5 2 2.5 3
5
10
15
20
25
30
Throughput (GOPS)
AR
ela
y/A
CM
OS
W Relay (buffered, 25fF)
W Relay (buffered, 100fF)
Au Relay (25fF)
Au Relay (100fF)
Erelay = 20fJ/op
10.110.1
CMOS vs. Relays: Mixed SignalCMOS vs. Relays: Mixed Signal
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 1919
I/O energy dominant if core is ~30x more efficientI/O energy dominant if core is ~30x more efficient
See if I/O circuits can benefit as wellSee if I/O circuits can benefit as well
Representative examples: DAC & ADCRepresentative examples: DAC & ADC
How to process/generate analog signals?How to process/generate analog signals?
No ―linear gain‖ in relays No ―linear gain‖ in relays –– rely on switchingrely on switching
10.110.1
NEM Relay Based DACNEM Relay Based DAC
Same topology as CMOS except:Same topology as CMOS except:
Need to add passive R to set impedance
Don’t have to buffer up or size relays to drive output
Relays’ gate voltage independent of I/O voltage
Energy dominated by I/O energy:Energy dominated by I/O energy:
@VIO=200mV, CL=1pF: 62.8fJ/bit vs. ~780fJ/bit for CMOS
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 2020
Voltage OutputDAC Architecture AC Impedance
2
2BIT IO LE V C
10.110.1
NEM Relay Based Flash ADCNEM Relay Based Flash ADC
Topologically similar to CMOSTopologically similar to CMOS
Sample/Hold input
Flash Converter – bank of
comparators
Key differences…Key differences…
Sample & Hold: Unbalanced ―on‖ vs.
―off‖ switching times
Comparators: Body terminal used as
threshold, ambipolar actuation,
hysteretic switching
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 2121
10.110.1
Body terminal used to set Body terminal used to set
thresholdthreshold
Hysteretic switching impactHysteretic switching impact
Comparator will have different
threshold if previous state varies
Need to reset all comparators to
the same state before each
evaluation dynamic logic
Relay Based ComparatorsRelay Based Comparators
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 2222
VT
10.110.1
AmbipolarAmbipolar actuation impactactuation impact
Comparators above & below input will evaluate need a different
decoder
Each comparator has a ―deadEach comparator has a ―dead--zone‖ where it stays offzone‖ where it stays off
Can use ―Max‖ or ―Min‖ of this range to determine code
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 2323
-0.2 0 0.2 0.4 0.6 0.8 10
10
20
30
40
50
60
Input Voltage (V)
Ou
tpu
t C
od
e
Max
Min
Relay Based ComparatorsRelay Based Comparators
VT
10.110.1
ADC ResultsADC Results
Energy dominated by resistor string (320fJ out of 350fJ)Energy dominated by resistor string (320fJ out of 350fJ)
Largely dependent on input dynamic range (VREF), not core voltage
FOM improves with more resolution as resistor string still
dominates
Most advantages stem from lower RMost advantages stem from lower Ronon at smaller Cat smaller Cgg’s’s
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 2424
Vcore = 0.3V
C = 500fF
R = 4kW
VREF = 1V
fsamp = 10 MS/s
6-bit res: 5.5fJ/conv
10.110.1
ConclusionsConclusions
NEM relays offer unique characteristics NEM relays offer unique characteristics
Superior Ion/Ioff ratio vs. CMOS
Superior Ron*Cg vs. CMOS
Switching delay largely independent of electrical t
Relay circuits show Relay circuits show order of magnitudeorder of magnitude energy energy
efficiency improvements over CMOSefficiency improvements over CMOS
Use parallelism (and area) to achieve comparable
throughputs in digital circuits
Mixed signal circuits see similar benefits from low Ron at
small Cgs
Key challenge is scaling devices reliablyKey challenge is scaling devices reliably
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 2525
10.110.1
AcknowledgementsAcknowledgements
Berkeley Wireless Research CenterBerkeley Wireless Research Center
NSFNSF
DARPADARPA
FCRP (C2S2)FCRP (C2S2)
Intel Foundation Graduate FellowshipIntel Foundation Graduate Fellowship
ICCAD 2008 ICCAD 2008 –– November 12, 2008November 12, 2008 2626