View
220
Download
1
Tags:
Embed Size (px)
Citation preview
August 9, 2006 Agrawal: VDAT'06 Tutorial II 1
Low-Power Electronics and Systems
Vishwani D. AgrawalJames J. Danaher Professor
Department of Electrical and Computer EngineeringAuburn University, Auburn, AL 36849, USA
http://www.eng.auburn.edu/[email protected]
August 9, 2006 Agrawal: VDAT'06 Tutorial II 2
Contents
• Introduction• Dynamic power
– Short circuit power– Reduced supply voltage operation– Glitch elimination
• Static (leakage) power reduction• Low power systems
– State encoding– Processor and multi-core design
• Books on low-power design
August 9, 2006 Agrawal: VDAT'06 Tutorial II 3
Introduction
Why is it a concern?
Power Consumption of VLSI Chips
August 9, 2006 Agrawal: VDAT'06 Tutorial II 4
ISSCC, Feb. 2001, Keynote“Ten years from now, microprocessors will run at 10GHz to 30GHz and be capable of processing 1 trillion operations per second -- about the same number of calculations that the world's fastest supercomputer can perform now.
“Unfortunately, if nothing changes these chips will produce as much heat, for their proportional size, as a nuclear reactor. . . .”
Patrick P. Gelsinger Senior Vice PresidentGeneral ManagerDigital Enterprise Group INTEL CORP.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 5
VLSI Chip Power Density
40048008
80808085
8086
286386
486Pentium®
P6
1
10
100
1000
10000
1970 1980 1990 2000 2010
Year
Po
wer
Den
sity
(W
/cm
2 )
Hot Plate
NuclearReactor
RocketNozzle
Sun’sSurface
Source: Intel
August 9, 2006 Agrawal: VDAT'06 Tutorial II 6
Meaning of Low-Power Design
• Design practices that reduce power consumption at least by one order of magnitude; in practice 50% reduction is often acceptable.
• General considerations in low-power design– Algorithms and architectures– High-level and software techniques– Gate and circuit-level methods– Power estimation techniques– Test power
August 9, 2006 Agrawal: VDAT'06 Tutorial II 7
Topics in Low-Power• Power dissipation in CMOS circuits• Device technology
– Low-power CMOS technologies– Energy recovery methods
• Circuit and gate level methods– Logic synthesis– Dynamic power reduction techniques– Leakage power reduction
• System level methods– Microprocessors– Arithmetic circuits– Low power memory technology
• Test power• Power estimation methods and tools
August 9, 2006 Agrawal: VDAT'06 Tutorial II 8
Power in a CMOS GateVVDDDD
iiDDDD(t)(t)
GroundGround
August 9, 2006 Agrawal: VDAT'06 Tutorial II 9
Power Dissipation in CMOS Logic (0.25µ)
%75 %5%20
Ptotal (0→1) = CL VDD2
+ tscVDD Ipeak + VDDIleakage
CL
VDD VDD
August 9, 2006 Agrawal: VDAT'06 Tutorial II 10
Power and Energy
• Instantaneous power (Watts)
P(t) = iDD(t) VDD
• Peak power (Watts)
Ppeak = Max {P(t)}• Average power (Watts)
Pav = [ ∫0
T P(t) dt ]/T
• Energy (Joules)
E = ∫0
T P(t) dt
August 9, 2006 Agrawal: VDAT'06 Tutorial II 11
Low-Power Design Techniques
• Circuit and gate level methods–Reduced supply voltage
–Adiabatic switching and charge recovery
–Logic design for reduced activity
–Reduced Glitches
–Transistor sizing
–Pass-transistor logic
–Pseudo-nMOS logic
–Multi-threshold gates
August 9, 2006 Agrawal: VDAT'06 Tutorial II 12
Low-Power Design Techniques
• Functional and architectural methods– Clock suppression– Clock frequency reduction– Supply voltage reduction– Power down– Algorithmic and Software methods
August 9, 2006 Agrawal: VDAT'06 Tutorial II 13
Test Power• Power grid on a VLSI chip is designed for
certain current capacity during functional operation:– Average current → heat dissipation– Peak current → noise, ground bounce
• Problem – Tests like scan or BIST are nonfunctional and may cause higher than the functional circuit activity; a functionally good chip can fail the test.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 14
Power Estimation Methods
• Spice: Accurate but expensive
• Logic-level– Event-driven simulation– Statistical– Probabilistic
• High-level: Hierarchical
August 9, 2006 Agrawal: VDAT'06 Tutorial II 15
Components of Power• Dynamic
– Signal transitions• Logic activity• Glitches
– Short-circuit
• Static– Leakage Ptotal = Pdyn + Pstat
= Ptran + Psc + Pstat
August 9, 2006 Agrawal: VDAT'06 Tutorial II 16
Power of a Transition: Ptran
VVDDDD
GroundGround
CL
Ron
R=large
vi (t) vo(t) ic(t)
August 9, 2006 Agrawal: VDAT'06 Tutorial II 17
Charging of a Capacitor
V C
R
i(t) v(t)
Charge on capacitor, q(t) = C v(t)
Current, i(t) = dq(t)/dt = C dv(t)/dt
t = 0
August 9, 2006 Agrawal: VDAT'06 Tutorial II 18
i(t) = C dv(t)/dt = [V – v(t)] /R dv(t) V – v(t) ─── = ───── dt RC
dv(t) dt∫ ───── = ∫───── V – v(t) RC
-t ln [V – v(t)] = ── + A
RC
Initial condition, t = 0, v(t) = 0 → A = ln V -t
v(t) = V [1 – exp(───)]
RC
August 9, 2006 Agrawal: VDAT'06 Tutorial II 19
-t v(t) = V [1 – exp( ── )]
RC
dv(t) V -ti(t) = C ─── = ── exp( ── )
dt R RC
August 9, 2006 Agrawal: VDAT'06 Tutorial II 20
Total Energy Per Charging Transition from Power Supply
∞ ∞ V2 -tEtrans = ∫ V i(t) dt = ∫ ── exp( ── ) dt
0 0 R RC
= CV2
August 9, 2006 Agrawal: VDAT'06 Tutorial II 21
Energy Dissipated per Transition in Resistance (R) of “On” Transistors
∞ V2 ∞ -2tR ∫ i2(t) dt = R ── ∫ exp( ── ) dt 0 R2 0 RC
1= ─ CV2
2
August 9, 2006 Agrawal: VDAT'06 Tutorial II 22
Energy Stored in Charged Capacitor
∞ ∞ -t V -t∫ v(t) i(t) dt = ∫ V [1- exp( ── )] ─ exp( ── ) dt0 0 RC R RC
1 = ─ CV2
2
August 9, 2006 Agrawal: VDAT'06 Tutorial II 23
Transition Power• Gate output rising transition
– Energy dissipated in pMOS transistor = CV2/2– Energy stored in capacitor = CV2/2
• Gate output falling transition– Energy dissipated in nMOS transistor = CV2/2
• Energy dissipated per transition = CV2/2• Power dissipation:
Ptrans = Etrans α fck = α fck CV2/2
α = activity factor
August 9, 2006 Agrawal: VDAT'06 Tutorial II 24
Short Circuit Current, isc(t)
Time (ns)0 1
Amp
Volt
VDD
isc(t)
0
Vi(t)Vo(t)
VDD - VTp
VTn
tB tE
Iscmaxf
VDD
Vi(t) Vo(t)
GND
August 9, 2006 Agrawal: VDAT'06 Tutorial II 25
Short-Circuit Energy per Transition
• Escf =∫tB
tE VDD isc(t)dt = (tE – tB) IscmaxfVDD /2
• Escf = tf (VDD- |VTp| -VTn) Iscmaxf /2
• Escr = tr (VDD- |VTp| -VTn) Iscmaxr /2
• Escf = 0, when VDD = |VTp| + VTn
August 9, 2006 Agrawal: VDAT'06 Tutorial II 26
Short-Circuit Power and Voltage Scaling
• Decreases and eventually becomes zero when VDD is scaled down but the threshold voltages are not scaled down.
• References:– M. A. Ortega and J. Figueras, “Short Circuit Power
Modeling in Submicron CMOS,” PATMOS’96, Aug. 1996, pp. 147-166.
– T. Sakurai and A. Newton, “Alpha-power Law MOSFET model and Its Application to a CMOS Inverter,” IEEE J. Solid State Circuits, vol. 25, April 1990, pp. 584-594.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 27
Psc and Output Capacitance
VVDDDD
GroundGround
CL
Ron
R=large
vi (t) vo(t) ic(t)+isc(t)
tftr vo(t)───
R↑
August 9, 2006 Agrawal: VDAT'06 Tutorial II 28
isc and Output Capacitance
-tVDD[1- exp(─────)]
vo(t) R↓tf (t)CIsc(t) = ──── = ──────────────
R↑tf (t) R↑tf (t)
August 9, 2006 Agrawal: VDAT'06 Tutorial II 29
iscmax and Output Capacitance
Small C Large C
tf
1────R↑tf (t)
iscmax
vo(t) vo(t)
i
t
August 9, 2006 Agrawal: VDAT'06 Tutorial II 30
Psc, Output Rise Times, Capacitance
• For given input rise and fall times short circuit power decreases as output capacitance increases.
• Short circuit power increases with increase of input rise and fall times.
• Short circuit power is reduced if output rise and fall times are smaller than the input rise and fall times.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 31
Effects of Scaling Down
• 1-16% short-circuit power at 0.7 micron
• 4-37% at 0.35 micron
• 12-60% at 0.17 micron
• Reference: S. R. Vemuru and N. Steinberg, “Short Circuit Power Dissipation Estimation for CMOS Logic Gates,” IEEE Trans. on Circuits and Systems I, vol. 41, Nov. 1994, pp. 762-765.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 32
Summary: Short-Circuit Power
• Short-circuit power is consumed by each transition (increases with input transition time).
• Reduction requires that gate output transition should not be faster than the input transition (faster gates can consume more short-circuit power).
• Increasing the output load capacitance reduces short-circuit power.
• Scaling down of supply voltage with respect to threshold voltages reduces short-circuit power.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 33
Dynamic Power
VVDDDD
GroundGround
CL
R
R
Dynamic Power
= CLVDD2/2 + Psc
Vi
Vo
isc
August 9, 2006 Agrawal: VDAT'06 Tutorial II 34
Dynamic Power Reduction
• Reduce power per transition– Reduced voltage operation – voltage scaling– Capacitance minimization – device sizing
• Reduce number of transitions– Glitch elimination
August 9, 2006 Agrawal: VDAT'06 Tutorial II 35
CMOS Dynamic PowerDynamic Power = Σ 0.5 αi fclk CLi VDD
2
All gates i
≈ 0.5 α fclk CL VDD2
≈ α01 fclk CL VDD2
where α average gate activity factorα01 = 0.5α, average 0→1 trans.fclk clock frequencyCL total load capacitanceVDD supply voltage
August 9, 2006 Agrawal: VDAT'06 Tutorial II 36
Example: 0.25μm CMOS Chip
• f = 500MHz
• Average capacitance = 15fF/gate
• VDD = 2.5V
• 106 gates
• Power = α01 f CL VDD2
= α01×500×106×(15×10-15×106) ×2.52
= 46.9W, for α01 = 1.0
August 9, 2006 Agrawal: VDAT'06 Tutorial II 37
Signal Activity, α
T=1/f
Clock α01= 1.0
α01= 0.5
α01= 0.5
Comb.signals
August 9, 2006 Agrawal: VDAT'06 Tutorial II 38
Reducing Dynamic Power
• Dynamic power reduction is– Quadratic with reduction of supply voltage– Linear with reduction of capacitance
August 9, 2006 Agrawal: VDAT'06 Tutorial II 39
0.25μm CMOS Inverter, VDD=2.5V
0
-4
-8
-12
-16
-20
Vin (V)
Vou
t (V
)
Vin (V)
2.5
2.0
1.5
1.0
0.5
00 0.5 1.0 1.5 2.0 2.5 0 0.5 1.0 1.5 2.0 2.5
Gai
n
August 9, 2006 Agrawal: VDAT'06 Tutorial II 40
0.25μm CMOS Inverter, VDD< 2.5V
0.2
0.15
0.1
0.05
0
Vin (V)
Vou
t (V
)
Vin (V)
2.5
2.0
1.5
1.0
0.5
00 0.5 1.0 1.5 2.0 2.5 0 0.05 0.1 0.15 0.2
Vou
t (V
)
Gain = -1
August 9, 2006 Agrawal: VDAT'06 Tutorial II 41
Lower Bound on VDD
• For proper operation of gate, maximum gain (for Vin = VDD/2) should be greater than 1.
• Gainmax = -(1/n)[exp(VDD /2ΦT) – 1] = -1 • n = 1.5
• ΦT = kT/q = 26mV
• VDD = 48V
• VDDmin > 2 to 4 times kT/q or ~100mV at room temperature (27oC)
• Ref.: J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, 2003.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 42
Impact of VDD on Performance
CLVDD
Inverter delay = K ───────(VDD – Vt )α
0.6V 1.8V 3.0V VDD
Power
Delay
40
30
20
10
0
Del
ay (
ns)
VDD=Vt
Po
we
r (l
og
sca
le)
August 9, 2006 Agrawal: VDAT'06 Tutorial II 43
Optimum Power × Delay VDD
3
Power × Delay, PD = constant × ─────── (VDD – Vt)α
For minimum power-delay product, d(PD)/dVDD = 0
3VtVDD = ───
3 – α
For long channel devices, α = 2, VDD = 3Vt
For very short channel devices, α = 1, VDD = 1.5Vt
August 9, 2006 Agrawal: VDAT'06 Tutorial II 44
Transistor Sizing for Performance
• Problem: If we increase W/L to make the charging or discharging of load capacitance, then the increased W increases the load for the driving gate
Cin CL
August 9, 2006 Agrawal: VDAT'06 Tutorial II 45
Fixed-Taper Buffer
VinVout
CLCin
1 α α2 αi-1 αn-1
Ci = αi-1Cin
CL = αnCin
Delay= t0
Ref.: J. Segura and C. F. Hawkins, CMOS Electronics, How It Works,How It Fails, Piscataway, New Jersey: IEEE Press, 2004.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 46
Buffer (Cont.)
αn = CL/Cin
ln (CL/Cin)n = ──────
ln α
ith stage delay, ti = αt0, i = 1, . . . n, because each stage drives a stage α times bigger than itself.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 47
Buffer (Cont.)
nTotal delay = Σ ti = nαt0
i=1
= ln(CL/Cin) αt0/ln(α)
August 9, 2006 Agrawal: VDAT'06 Tutorial II 48
Buffer (Cont.)
Differentiating total delay with respect to α and equating to 0, we get
αopt = e ≈ 2.7
The optimum number of stages is
nopt = ln(CL/Cin)
August 9, 2006 Agrawal: VDAT'06 Tutorial II 49
Further Reading
B. S. Cherkauer and E. G. Friedman, “A Unified DesignMethodology for CMOS Tapered Buffers,” IEEE Trans.VLSI Systems, vol. 3, no. 1, pp. 99-111, March 1995.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 50
Logic Activity and Glitches
4 5
7
61
2
3
d=2d=1 d=1
d=1
August 9, 2006 Agrawal: VDAT'06 Tutorial II 51
Glitch Power Reduction
• Design a digital circuit for minimum transient energy consumption by eliminating hazards
August 9, 2006 Agrawal: VDAT'06 Tutorial II 52
Theorem 1• For correct operation with minimum
energy consumption, a Boolean gate must produce no more than one event per transition.
Output logic state changesOne transition is necessary
Output logic state unchangedNo transition is necessary
August 9, 2006 Agrawal: VDAT'06 Tutorial II 53
Inertial Delay of a Gate (Inverter)
dHL dLH
dHL+dLH
d = ──── 2
Vin
Vout
time
August 9, 2006 Agrawal: VDAT'06 Tutorial II 54
• Given that events occur at the input of a gate with inertial delay d at times,
t1 ≤ . . . ≤ tn , the number of events at the gate output cannot exceed
Theorem 2
min ( min ( n n , 1 + ), 1 + )ttnn – t – t11
----------------dd
ttnn - t - t11
tt11 t t22 t t33 t tnn timetime
August 9, 2006 Agrawal: VDAT'06 Tutorial II 55
Minimum Transient Design
• Minimum transient energy condition for a Boolean gate:
| t| tii - t - tjj | < d | < d
Where tWhere tii and t and tjj are arrival times of input are arrival times of input
events and d is the inertial delay of gateevents and d is the inertial delay of gate
August 9, 2006 Agrawal: VDAT'06 Tutorial II 56
Balanced Delay Method
• All input events arrive simultaneously• Overall circuit delay not increased• Delay buffers may have to be inserted
11 111111 11
111111
33
11 11
4?4?
August 9, 2006 Agrawal: VDAT'06 Tutorial II 57
Hazard Filter Method• Gate delay is made greater than maximum input path delay
difference• No delay buffers needed (least transient energy)• Overall circuit delay may increase
33 111111 11
33111111 11
August 9, 2006 Agrawal: VDAT'06 Tutorial II 58
Glitch-Free Design by Linear Programming
• Variables: gate and buffer delays
• Objective: minimize number of buffers
• Subject to: overall circuit delay
• Subject to: minimum transient condition for multi-input gate
August 9, 2006 Agrawal: VDAT'06 Tutorial II 59
Variables for Full-Adder
• Gate delay variables d4 . . . d12
• Buffer delay variables d15 . . . d29
Delay variables are located at the checkpoints of the circuit.
Delay variables
August 9, 2006 Agrawal: VDAT'06 Tutorial II 60
Objective Function
• Ideal: minimize the number of non-zero delay buffers
• Actual: minimize sum of buffer delays
August 9, 2006 Agrawal: VDAT'06 Tutorial II 61
Specify Critical Path Delay
1111
11 11
1111
11
11
11
0000
00
0000
00
00 0000
0000
00
00
Sum of delays on critical path ≤ Sum of delays on critical path ≤ maxdelmaxdel
Original design
August 9, 2006 Agrawal: VDAT'06 Tutorial II 62
Multi-Input Gate Condition
11
11 11
11
d1d1
d2d2
dd
d1 - d2 ≤ dd1 - d2 ≤ dd2 - d1 ≤ dd2 - d1 ≤ d
dd
dd
|d1 - d2| ≤ d ≡|d1 - d2| ≤ d ≡
August 9, 2006 Agrawal: VDAT'06 Tutorial II 63
Results: 1-Bit Adder
R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, South San Francisco: The Scientific Press, 1993.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 64
AMPL Solution: maxdel = 6
2211
11 11
1111
22
11
22
22
11
August 9, 2006 Agrawal: VDAT'06 Tutorial II 65
AMPL Solution: maxdel = 7
2222
11 11
1111
11
11
33
22
August 9, 2006 Agrawal: VDAT'06 Tutorial II 66
AMPL Solution: maxdel ≥ 11
2233
11 11
1111
44
33
55
August 9, 2006 Agrawal: VDAT'06 Tutorial II 67
Removing a Limitation• Constraints are written by path enumeration.
• Since number of paths in a circuit can be exponential
in circuit size, the formulation is infeasible for large
circuits.
• Example: c880 has 6.96M constraints.
• Solution: A linear complexity method. See,– T. Raja, Master’s Thesis, Rutgers University, 2002.
– T. Raja, V. D. Agrawal and M. L. Bushnell, “Minimum
Dynamic Power CMOS Circuit Design by a Reduced
Constraint Set Linear Program,” Proc. 16th International Conf.
VLSI Design, 2003, pp. 527-532.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 68
Comparison of Constraints
Number of gates in circuit
Nu
mb
er
of
con
stra
ints
August 9, 2006 Agrawal: VDAT'06 Tutorial II 69
Benchmark Circuits
Circuit
C432
C880
C6288
c7552
Maxdel.(gates)
1734
2448
4794
4386
No. ofBuffers
9566
6234
294120
366111
Average
0.720.62
0.680.68
0.400.36
0.380.36
Peak
0.670.60
0.540.52
0.360.34
0.340.32
Normalized Power
August 9, 2006 Agrawal: VDAT'06 Tutorial II 70
c7552: 3,500-gate CMOS Circuit
Clock CyclesInst
an
tan
eo
us
En
erg
y x
10--
10 J
ou
les
August 9, 2006 Agrawal: VDAT'06 Tutorial II 71
References• R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for
Mathematical Programming, South San Francisco: The Scientific Press, 1993.• M. Berkelaar and E. Jacobs, “Using Gate Sizing to Reduce Glitch Power,” Proc.
ProRISC Workshop, Mierlo, The Netherlands, Nov. 1996, pp. 183-188.• V. D. Agrawal, “Low Power Design by Hazard Filtering,” Proc. 10th Int’l Conf.
VLSI Design, Jan. 1997, pp. 193-197.• V. D. Agrawal, M. L. Bushnell, G. Parthasarathy and R. Ramadoss, “Digital
Circuit Design for Minimum Transient Energy and Linear Programming Method,” Proc. 12th Int’l Conf. VLSI Design, Jan. 1999, pp. 434-439.
• M. Hsiao, E. M. Rudnick and J. H. Patel, “Effects of Delay Model in Peak Power Estimation of VLSI Circuits,” Proc. ICCAD, Nov. 1997, pp. 45-51.
• T. Raja, A Reduced Constraint Set Linear Program for Low Power Design of Digital Circuits, Master’s Thesis, Rutgers Univ., New Jersey, 2002.
• T. Raja, V. D. Agrawal and M. L. Bushnell, “Transistor Sizing of Logic gates to Maximize Input Delay Variability,” J. of Low Power Electronics (JOLPE), vol. 2, pp. 121-128, 2006.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 72
Static (Leakage) Power
• Dynamic– Signal transitions
• Logic activity• Glitches
– Short-circuit
• Static– Leakage
August 9, 2006 Agrawal: VDAT'06 Tutorial II 73
Leakage Power
IG
ID
Isub
IPT
IGIDL
n+ n+
GroundVDD
R
August 9, 2006 Agrawal: VDAT'06 Tutorial II 74
Leakage Current Components
• Subthreshold conduction, Isub
• Reverse bias pn junction conduction, ID
• Gate induced drain leakage, IGIDL due to
tunneling at the gate-drain overlap
• Drain source punchthrough, IPT due to short
channel and high drain-source voltage
• Gate tunneling, IG through thin oxide
August 9, 2006 Agrawal: VDAT'06 Tutorial II 75
Subthreshold Current
Isub = μ0 Cox (W/L) Vt2 exp{(VGS-VTH)/nVt}
μ0: carrier surface mobility
Cox: gate oxide capacitance per unit area
L: channel lengthW: gate widthVt = kT/q: thermal voltage
n: a technology parameter
August 9, 2006 Agrawal: VDAT'06 Tutorial II 76
IDS for Short Channel Device
Isub = μ0 Cox (W/L) Vt2 exp{(VGS-VTH+ηVDS)/nVt}
VDS = drain to source voltage
η: a proportionality factor
August 9, 2006 Agrawal: VDAT'06 Tutorial II 77
Increased Subthreshold Leakage
0 VTH’ VTH
Lo
g I
sub
Gate voltage
Scaled device
Ic
August 9, 2006 Agrawal: VDAT'06 Tutorial II 78
Reducing Leakage Power
• Leakage power as a fraction of the total power increases as clock frequency drops. Turning supply off in unused parts can save power.
• For a gate it is a small fraction of the total power; it can be significant for very large circuits.
• Scaling down features requires lowering the threshold voltage, which increases leakage power; roughly doubles with each shrinking.
• Multiple-threshold devices are used to reduce leakage power.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 79
Problem Statement• Problem: To Design a CMOS Circuit,
– using dual-threshold devices to globally minimize subthreshold leakage
– using delay elements to eliminate all glitches– maintaining specified performance– allowing performance-power tradeoff
• Reference: Y. Lu and V. D. Agrawal, “Leakage and Dynamic Glitch Power Minimization Using Integer Linear Programming for Vth Assignment and Path Balancing,” Proc. PATMOS, 2005, pp. 217-226.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 80
MILP: Mixed Integer Linear ProgramMinimize { Σ Xi ILi + (1-Xi)IHi
all gates i
+ Σ Σ Δdij } all gates i→ j
Where Xi = 1, gate i has low Vth, low leakage = ILi
Xi = 0, gate i has high Vth, high leakage = IHi
Δdij = delay inserted between gates i and j
for glitch suppression
Xi = [0,1], is an integer, Δdij is a real variable
ILi and IHi are constants for gate i obtained by SPICE simulation
August 9, 2006 Agrawal: VDAT'06 Tutorial II 81
MILP - Constraints
Circuit delay constraint for each PO i: Tmax can be the delay of critical path or clock period specified by the
circuit designer.
Glitch suppression constraint for each gate i:
(1)
(2)
(3)
Constraints (1), (2) and (3) make sure that Ti - ti < di for each gate, so glitches are eliminated.
Ti is the latest signal arrival time at the output of gate i.
ti is the earliest signal arrival time at the output of gate i.
iiHiiLii
HiiLiijiji
HiiLiijiji
tTDXDX
DXDXdtt
DXDXdTT
)1(
)1(
)1(
,
,
maxTTi
August 9, 2006 Agrawal: VDAT'06 Tutorial II 82
Power-Delay Tradeoff Example14-Gate Full Adder (Unptimized, Tmax = Tc)
A
B
C
S
C0
Low Vth gates
Critical path
Ileak = 161 pA
August 9, 2006 Agrawal: VDAT'06 Tutorial II 83
Power-Delay Tradeoff Example14-Gate Full Adder (Optimized, Tmax = Tc)
A
B
C
S
C0
Low Vth
High Vth
Delay buffer (high Vth)
Critical path
Ileak = 73 pA
August 9, 2006 Agrawal: VDAT'06 Tutorial II 84
Power-Delay Tradeoff Example14-Gate Full Adder (Optimized, Tmax =
1.25Tc)A
B
C
S
C0
Low Vth
High Vth
Delay buffer (high Vth)
Critical path
Ileak = 16 pA
August 9, 2006 Agrawal: VDAT'06 Tutorial II 85
Leakage Reduction and Performance Tradeoff @ 27 , 70nm℃
Circuit#
gates
CriticalPathDelay
Tc (ns)
UnoptimizedIleak (μA)
Optimized Ileak (μA)
(Tmax= Tc )
LeakageReduction
Sun OS 5.7 CPUsecs.
Optimized Ileak (μA)
(Tmax=
1.25Tc )
Leakage Reduction
SunOS 5.7CPUsecs.
C432 160 0.751 2.620 1.022 61.0% 0.42 0.132 95.0% 0.3
C499 182 0.391 4.293 3.464 19.3% 0.08 0.225 94.8% 1.8
C880 328 0.672 4.406 0.524 88.1% 0.24 0.153 96.5% 0.3
C1355 214 0.403 4.388 3.290 25.0% 0.1 0.294 93.3% 2.1
C1908 319 0.573 6.023 2.023 66.4% 59 0.204 96.6% 1.3
C2670 362 1.263 5.925 0.659 90.4% 0.38 0.125 97.9% 0.16
C3540 1097 1.748 15.622 0.972 93.8% 3.9 0.319 98.0% 0.74
C5315 1165 1.589 19.332 2.505 87.1% 140 0.395 98.0% 0.71
C6288 1189 2.177 23.142 6.075 73.8% 277 0.678 97.1% 7.48
C7552 1046 1.915 22.043 0.872 96.0% 1.1 0.445 98.0% 0.58
August 9, 2006 Agrawal: VDAT'06 Tutorial II 86
Leakage, Dynamic and Total Power Comparison @ 90 , 70nm℃
Circuit#
Gates
Leakage Power Dynamic Power Total Power
Pleak1*
(uW)
Pleak2*
(uW)Leakage
Reduction Pdyn1*
(uW)
Pdyn2*
(uW)Dynamic
Reduction Ptotal1*
(uW)
Ptotal2*
(uW)Total
Reduction
C432 160 35.77 11.87 66.8% 101.0 73.3 27.4% 136.8 85.2 37.7%
C499 182 50.36 39.94 20.7% 225.7 160.3 29.0% 276.1 200.2 27.5%
C880 328 85.21 11.05 87.0% 177.3 128.0 27.8% 262.5 139.1 47.0%
C1355 214 54.12 39.96 26.3% 293.3 165.7 43.5% 347.4 205.7 40.8%
C1908 319 92.17 29.69 67.8% 254.9 197.7 22.4% 347.1 227.4 34.5%
C2670 362 115.4 11.32 90.2% 128.6 100.8 21.6% 244.0 112.1 54.1%
C3540 1097 302.8 17.98 94.1% 333.2 228.1 31.5% 636.0 246.1 61.3%
C5315 1165 421.1 49.79 88.2% 465.5 304.3 34.6% 886.6 354.1 60.1%
C6288 1189 388.5 97.17 75.0% 1691.2 405.6 76.0% 2079.7 502.8 75.8%
C7552 1046 444.4 18.75 95.8% 380.9 227.8 40.2% 825.3 246.6 70.1%
* 1: unoptimized circuits; 2: optimized circuits.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 87
Low-Power System Design• State encoding
– Bus encoding– Finite state machine
• Clock gating– Flip-flop– Shift register
• Microprocessors– Single processor– Multi-core processor
August 9, 2006 Agrawal: VDAT'06 Tutorial II 88
Bus Encoding• Example: Four bit bus
• 0000→1110 has three transitions.• If bits of second pattern are inverted, then 0000→0001 will
have only one transition.
• Bit-inversion encoding for N-bit bus:
Number of bit transitions0 N/2 N
N
N/2
0Nu
mb
er
of b
it tr
an
sitio
ns
afte
r in
vers
ion
en
cod
ing
August 9, 2006 Agrawal: VDAT'06 Tutorial II 89
Bus-Inversion Encoding Logic
Polarity decision
logic
Se
nt d
ata
Re
ceiv
ed
da
ta
Bus register
Polarity bit
M. Stan and W. Burleson, “Bus-InvertCoding for Low Power I/O,” IEEE Trans.VLSI Systems, vol. 3, no. 1, pp. 49-58,March 1995.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 90
FSM State Encoding
11
01000.1
0.10.4
0.3
0.6 0.9
0.6
01
11000.1
0.10.4
0.3
0.6 0.9
0.6
Expected number of state-bit transitions:
2(0.3+0.4) + 1(0.1+0.1) = 1.6 1(0.3+0.4+0.1) + 2(0.1) = 1.0
Transition probability based on
PI statistics
State encoding can be selected using a power-based cost function.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 91
FSM: Clock-Gating• Moore machine: Outputs depend only on
the state variables.– If a state has a self-loop in the state transition
graph (STG), then clock can be stopped whenever a self-loop is to be executed.
Sj
SiSk
Xi/Zk
Xk/Zk
Xj/Zk
Clock can be stopped when (Xk, Sk) combination occurs.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 92
Clock-Gating in Moore FSM
Combinational logic
LatchClock
activation logic
Flip
-flo
ps
PI
CK
PO
L. Benini and G. De Micheli,Dynamic Power Management,Boston: Springer, 1998.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 93
Clock-Gating in Low-Power Flip-Flop
D QD
CK
C. Piguet, “Circuit and Logic Level Design,” pages 103-133 in W. Nebel and J. Mermet (ed.), Low Power Design in Deep Submicron Electronics, Boston: Kluwer Academic Publishers, 1997.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 94
Reduced-Power Shift Register
D Q D Q D Q
D QD QD Q
D Q
D Q
D
CK(f/2)
mu
ltip
lexe
r
Output
Flip-flops are operated at full voltage and half the clock frequency.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 95
Power Reduction in Processors
• Just about everything is used.• Hardware methods:
• Voltage reduction for dynamic power• Dual-threshold devices for leakage reduction• Clock gating, frequency reduction• Sleep mode
• Architecture:• Instruction set• hardware organization
• Software methods
August 9, 2006 Agrawal: VDAT'06 Tutorial II 96
SIA Roadmap for Processors (1999)Year 1999 2002 2005 2008 2011 2014
Feature size (nm) 180 130 100 70 50 35
Logic transistors/cm2 6.2M 18M 39M 84M 180M 390M
Clock (GHz) 1.25 2.1 3.5 6.0 10.0 16.9
Chip size (mm2) 340 430 520 620 750 900
Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.5
High-perf. Power (W) 90 130 160 170 175 183
Source: http://www.semichips.org
August 9, 2006 Agrawal: VDAT'06 Tutorial II 97
Power Reduction Example
• Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W• Reduce voltage to 1.5V, power (5.3x) = 4.9W• Eliminate FP, power (3x) = 1.6W• Scale 0.75→0.35μ, power (2x) = 0.8W• Reduce clock load, power (1.3x) = 0.6W• Reduce frequency 200→160MHz, power (1.25x) = 0.5W• J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC
Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 98
Low-Power Datapath Architecture• Lower supply voltage
– This slows down circuit speed– Use parallel computing to gain the speed back
• Works well when threshold voltage is also lowered.
• About 60% reduction in power obtainable.• Reference: A. P. Chandrakasan and R. W.
Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 99
A Reference Datapath
Combinationallogic
OutputInputR
eg
iste
r
Re
gis
ter
CK
Supply voltage = Vref
Total capacitance switched per cycle = Cref
Clock frequency = fPower consumption: Pref = CrefVref
2f
Cref
August 9, 2006 Agrawal: VDAT'06 Tutorial II 100
A Parallel ArchitectureComb.Logic
Copy 1
Comb.Logic
Copy 2
Comb.Logic
Copy N
Re
gis
ter
Re
gis
ter
Re
gis
ter
Re
gis
ter
N to
1 m
ulti
ple
xer
MultiphaseClock gen. and mux
control
InputOutput
CK
f
f/N
f/N
f/N
A copy processes every Nth input, operates at reduced voltage
Supply voltage:VN ≤ V1 = Vref
N = Deg. of parallelism
August 9, 2006 Agrawal: VDAT'06 Tutorial II 101
Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
August 9, 2006 Agrawal: VDAT'06 Tutorial II 102
PowerPN = Pproc + Poverhead
Pproc = N(Cinreg+ Ccomb)VN2f/N + CoutregVN
2f
= (Cinreg+ Ccomb+Coutreg)VN2f
= CrefVN2f
Poverhead = CoverheadVN2f ≈ δCref(N – 1)VN
2f
PN = [1 + δ(N – 1)]CrefVN2f
PN VN2
── = [1 + δ(N – 1)] ───P1 Vref
2
August 9, 2006 Agrawal: VDAT'06 Tutorial II 103
Voltage vs. Speed CLVref CLVref
Delay of a gate, T ≈ ──── = ────────── I k(W/L)(Vref – Vt)2
where I is saturation currentk is a technology parameterW/L is width to length ratio of transistorVt is threshold voltage
Supply voltage
No
rma
lize
d g
ate
de
lay,
T
4.0
3.0
2.0
1.0
0.0 Vt Vref =5VV2=2.9V
N=1
N=2
V3
N=31.2μ CMOS Voltage reduction
slows down as we get closer to Vt
August 9, 2006 Agrawal: VDAT'06 Tutorial II 104
Increasing Multiprocessing
PN/P1
1 2 3 4 5 6 7 8 9 10 11 12
1.0
0.8
0.6
0.4
0.2
0.0
Vt=0V (extreme case)
Vt=0.4V
Vt=0.8V
N
1.2μ CMOS, Vref = 5V
August 9, 2006 Agrawal: VDAT'06 Tutorial II 105
Extreme Cases: Vt = 0Delay, T α 1/ Vref
For N processing elements, delay = NT → VN = Vref/N
PN 1── = [1+ δ (N – 1)] ── → 1/NP1 N2
For negligible overhead, δ→0
PN 1── ≈ ──P1 N2
For Vt > 0, power reduction is less and there will be an optimum value of N.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 106
Example: Multiplier Core
• Specification:• 200MHz Clock• 15W dissipation @ 5V• Low voltage operation, VDD ≥ 1.5 volts
(VDD – 0.5)2
Relative clock rate = ─────── 20.25
• Problem:• Integrate multiplier core on a SOC• Power budget for multiplier ~ 5W
August 9, 2006 Agrawal: VDAT'06 Tutorial II 107
A Multicore Design
MultiplierCore 1
MultiplierCore 5
Reg
RegR
egR
eg
5 to
1 m
ux
MultiphaseClock gen.
and muxcontrol
Input
Output
200MHzCK
200MHz
40MHz
40MHz
40MHz
MultiplierCore 2
Core clock frequency = 200/N, N should divide 200.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 108
How Many Cores?
• For N cores:• clock frequency = 200/N MHz
• Supply voltage, VDDN= 0.5 + (20.25/N)1/2 Volts
• Assuming 10% overhead per core, VDDN
Power dissipation =15 [1 + 0.1(N – 1)] (───)2
watts 5
August 9, 2006 Agrawal: VDAT'06 Tutorial II 109
Design TradeoffsNumber of cores
NClock (MHz)
Core supply VDDN (Volts)
Total Power
(Watts)
1 200 5.00 15.0
2 100 3.68 8.94
4 50 2.75 5.90
5 40 2.51 5.29
8 25 2.10 4.50
August 9, 2006 Agrawal: VDAT'06 Tutorial II 110
Pipeline Architecture
Processor
f
Input Output
Re
gis
ter
½Proc.
f
Input Output
Re
gis
ter
½Proc.
Re
gis
ter
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 1.2CVoltage = 0.6VFrequency = fPower = 0.432CV2f
August 9, 2006 Agrawal: VDAT'06 Tutorial II 111
Approximate Trend n-parallel proc. n-stage pipeline proc.
Capacitance nC C
Voltage V/n V/n
Frequency f/n f
Power CV2f/n2 CV2f/n2
Chip area n times 10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 112
Multicore Processors
2000 2004 2008
Per
form
ance
bas
ed o
nS
PE
Cin
t200
0 an
d S
PE
Cfp
2000
ben
chm
arks
Multicore
Single core
Computer, May 2005, p. 12
August 9, 2006 Agrawal: VDAT'06 Tutorial II 113
Multicore Processors
• D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005.
• A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors.
• S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 114
Cell - Cell Broadband Engine Architecture
L to RAtsushi Kameyama, ToshibaJames Kahle, IBMMasakazu Suzoki, Sony
© I
EE
E S
pe
ctru
m,
Jan
ua
ry 2
00
6
Nine-processor chip:192 Gflops
August 9, 2006 Agrawal: VDAT'06 Tutorial II 115
Cell’s Nine-Processor Chip
© IEEE Spectrum, January 2006 Eight IdenticalProcessors f = 5.6GHz (max)44.8 Gflops
August 9, 2006 Agrawal: VDAT'06 Tutorial II 116
Books on Low-Power Design (1) • L. Benini and G. De Micheli, Dynamic Power Management Design Techniques and
CAD Tools, Boston: Springer, 1998.• T. D. Burd and R. A. Brodersen, Energy Efficient Microprocessor Design, Boston:
Springer, 2002.• A. Chandrakasan and R. Brodersen, Low-Power Digital CMOS Design, Boston:
Springer, 1995.• A. Chandrakasan and R. Brodersen, Low-Power CMOS Design, New York: IEEE
Press, 1998.• J.-M. Chang and M. Pedram, Power Optimization and Synthesis at Behavioral
and System Levels using Formal Methods, Boston: Springer, 1999.• M. S. Elrabaa, I. S. Abu-Khater and M. I. Elmasry, Advanced Low-Power Digital
Circuit Techniques, Boston: Springer, 1997.• R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum
Publishers, 2002.• S. Iman and M. Pedram, Logic Synthesis for Low Power VLSI Designs, Boston:
Springer, 1998.• J. B. Kuo and J.-H. Lou, Low-Voltage CMOS VLSI Circuits, New York: Wiley-
Interscience, 1999.• J. Monteiro and S. Devadas, Computer-Aided Design Techniques for Low Power
Sequential Logic Circuits, Boston: Springer, 1997.• S. G. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS
Technologies, Boston: Springer, 2005.• W. Nebel and J. Mermet, Low Power Design in Deep Submicron Electronics,
Boston: Springer, 1997.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 117
Books on Low-Power Design (2)• N. Nicolici and B. M. Al-Hashimi, Power-Constrained Testing of VLSI Circuits,
Boston: Springer, 2003.• V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic and N. Nedovic, Digital System
Clocking: High Performance and Low-Power Aspects, Wiley-IEEE, 2005.• M. Pedram and J. M. Rabaey, Power Aware Design Methodologies, Boston:
Springer, 2002.• C. Piguet, Low-Power Electronics Design, Boca Raton: Florida: CRC Press, 2005.• J. M. Rabaey and M. Pedram, Low Power Design Methodologies, Boston:
Springer, 1996.• S. Roudy, P. K. Wright and J. M. Rabaey, Energy Scavenging for Wireless Sensor
Networks, Boston: Springer, 2003.• K. Roy and S. C. Prasad, Low-Power CMOS VLSI Circuit Design, New York: Wiley-
Interscience, 2000.• E. Sánchez-Sinencio and A. G. Andreaou, Low-Voltage/Low-Power Integrated
Circuits and Systems – Low-Voltage Mixed-Signal Circuits, New York: IEEE Press, 1999.
• W. A. Serdijn, Low-Voltage Low-Power Analog Integrated Circuits, Boston:Springer, 1995.
• S. Sheng and R. W. Brodersen, Low-Power Wireless Communications: A Wideband CDMA System Design, Boston: Springer, 1998.
• G. Verghese and J. M. Rabaey, Low-Energy FPGAs, Boston: springer, 2001.• G. K. Yeap, Practical Low Power Digital VLSI Design, Boston:Springer, 1998.• K.-S. Yeo and K. Roy, Low-Voltage Low-Power Subsystems, McGraw Hill, 2004.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 118
Other Books Useful in Low-Power Design
• A. Chandrakasan, W. J. Bowhill and F. Fox, Design of High-Performance Microprocessor Circuits, New York: IEEE Press, 2001.
• N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading, Massachusetts, Addison-Wesley, 2005.
• S. M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits, New York: McGraw-Hill, 1996.
• E. Larsson, Introduction to Advanced System-on-Chip Test Design and Optimization, Springer, 2005.
• J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Second Edition, Upper Saddle River, New Jersey: Prentice-Hall, 2003.
• J. Segura and C. F. Hawkins, CMOS Electronics, How It Works, How It Fails, New York: IEEE Press, 2004.