Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Methods for True Power MinimizationMethods for True Power Minimization
Robert W. Brodersen1, Mark A. Horowitz2, Dejan Markovic1, Borivoje Nikolic1 and Vladimir Stojanovic2
1Berkeley Wireless Research Center, UC Berkeley 2Stanford University
Power limited operation
Delay
Unoptimized design
Emax
DmaxDmin
Energy/op
Emin
Achieve the highest performance under the power cap
Power limited operation
Delay
Unoptimized design
Var1
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
Achieve the highest performance under the power cap
Power limited operation
Delay
Unoptimized design
Var1
Var2
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
Achieve the highest performance under the power cap
Power limited operation
Delay
Unoptimized design
Var1
Var2
Var1 + Var2
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
How far away are we from the optimal solution?
Power limited operation
Delay
Unoptimized design
Var1
Var2
Var1 + Var2
Global
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
Global optimum – best performance
Power limited operation
Delay
Unoptimized design
Emax
DmaxDmin
Energy/op
Emin
Maximize throughput for given energy orMinimize energy for given throughput
Design optimization♦ There are many sets of parameters to adjust♦ Tuning variables
Circuit(sizing, supply, threshold)
Logic style(domino, pass-gate, )
Block topology (adder: CLA, CSA, )
Micro-architecture (parallel, pipelined)
topology A
topology BDelay
Ener
gy/o
p
Design optimization♦ There are many sets of parameters to adjust♦ Tuning variables
Circuit(sizing, supply, threshold)
Logic style(domino, pass-gate, )
Block topology (adder: CLA, CSA, )
Micro-architecture (parallel, pipelined)
Globally optimal boundary curve: pieces of E-D curves for different topologies
topology A
topology BDelay
Ener
gy/o
p
Outline
♦ Circuit optimization♦ Joint optimization♦ Select the most promising sets of tuning variables
♦ Circuit & µArchitecture examples♦ Adder ♦ Add-Compare
♦ Conclusions
Energy-delay sensitivity
( )*dd dd
dddd
dd V V
EV
Sens VD
V=
∂∂
= −∂
∂
♦ Proposed by Zyban at ISLPED02
♦ ∆E = Sens(A)(-∆D) + Sens(B)∆D
At the optimal point,all sensitivities should be the same
Alpha-power based delay model
( ) dpard dd out
pin indd on
WK V WtW WV V α
⋅= ⋅ +
−
♦ Fitting parametersVon, αd, Kd
Alpha-power based delay model
( ) dpard dd out
pin indd on
WK V WtW WV V α
⋅= ⋅ +
− heff
♦ Fitting parametersVon, αd, Kd
♦ Effective fanout, heff
Energy model
♦ Switching energy
( ) 20 1 ( ) ( )Sw out par ddE C W C W Vα →= ⋅ + ⋅
♦ Leakage energy( )
00 ( )
th ddV VV
Lk in in ddE W I S e V Dγ− −
= ⋅ ⋅ ⋅ ⋅
Sensitivity to sizing and supply♦ Gate sizing (Wi)
( ), , 1
Swi i
nom eff i eff ii
EW ec
D h hW τ −
∂∂
− =∂ ⋅ −∂
∞ for equal heff (Dmin)
♦ Supply voltage (Vdd)
12
1
Swdd Sw v
d vdd
EV E x
D D xV α
∂∂ −
− = ⋅∂ − +
∂ Vth Vdd
Vddnom
Sens(Vdd)
0
xv = (Von+∆Vth)/Vdd
Sensitivity to Vth
♦ Threshold voltage (Vth)
0
( )1
( )
th dd on thLk
dth
EV V V V
PD VV α
∂ ∂ ∆ − − ∆
− = ⋅ − ∂ ⋅ ∂ ∆
0 Vth
Vthnom
Sens(Vth)
Low initial leakage ⇒ speedup comes for “free”
Optimization setup
♦ Reference/nominal circuit sized for Dmin @ Vddnom, Vthnom known average activity
♦ Set delay constraint
♦ Minimize energy under delay constraint gate sizing Vdd , Vth scaling
Kogge-Stone tree adder topology
S0
S15
(A0, B0)
(A15, B15)
♦ Off-path load (gates + wires)♦ Reconvergence (inside ●-block)
Tree adder: Sizing optimization♦ Reference: all paths are critical
Nominal (Dmin, Enom)
Sizing opt. (1.1Dmin, 0.3Enom)
Internal energy peaks ⇒Big savings for small delay penalty with resizing
Joint optimization: sizing and Vdd
Nominal design
minnomD
( , )nomddf W V( , )newddf W V
( )ddf VEn
ergy
/op
Delay
∆E = Sens(Vdd)•(-∆D) + Sens(W)•∆D
Results of joint optimizationEn
ergy
[Eno
m]
Delay [Dnom]
Nominal Design(Dnom, Enom)
Energy efficient curve f(W,Vdd,Vth)
(Dnom, Emin)
80% of energy saved without delay penalty
1 (reference)(Dnom, Emin)1.5Vdd
0.2∞(Dnom, Enom)VthWSens
Sensitivity table
Results of joint optimizationEn
ergy
[Eno
m]
Delay [Dnom]
Nominal Design(Dnom, Enom)
Energy efficient curve f(W,Vdd,Vth)
(Dnom, Emin)
(Dmin,Enom)
80% of energy saved without delay penalty40% speedup for same energy
22161 (reference)(Dnom, Emin)22(Dmin, Enom)
1.5Vdd
0.2∞(Dnom, Enom)VthWSens
Sensitivity table
A look at tuning variablesSupply Threshold
reliabilitylimit
Sens(Vdd)=Sens(W)=Sens(Vth)=1
Limited range of tuning variables
A look at tuning variablesSupply Threshold
reliabilitylimit
Sens(Vdd)=16Sens(Vth)=Sens(W)=22
Sens(Vdd)=Sens(W)=Sens(Vth)=1
Limited range of tuning variables
Reducing the number of dimensionsEn
ergy
[Eno
m]
Delay [Dnom]
Threshold and sizing nearly optimal around the nominal point
Scope of circuit optimizationEn
ergy
[Eno
m]
Delay [Dnom]
Effective region +/-30% around nominal delay
Circuit & µArchitecture optimization♦ Revisit the old argument for parallelism
A B A B
A BA B
f f2f
2f
f
f f f
(a) reference
(c) pipeline(b) parallel
♦ What happens if we can choose optimal Vdd and Vth for each design?
Balance of leakage and switching energy
2
ln
Lk
Sw Opt dtech
avg
EE L K
α
=
−
Optimal designs have high leakage current
Conclusions♦ All design levels need to be optimized jointly
♦ Equal marginal costs ⇒ Energy-efficient design
♦ Peak performance is VERY power inefficient
♦ Todays designs are not leaky enough to be truly power-optimal
♦ Pipelining starts to gain advantage over parallelism