29
Methods for True Power Minimization Methods for True Power Minimization Robert W. Brodersen 1 , Mark A. Horowitz 2 , Dejan Markovic 1 , Borivoje Nikolic 1 and Vladimir Stojanovic 2 1 Berkeley Wireless Research Center, UC Berkeley 2 Stanford University

Methods for True Power Minimizationbwrcs.eecs.berkeley.edu/.../Readings/1B_3_stojanovic.pdfMethods for True Power Minimization Robert W. Brodersen1, Mark A. Horowitz2, Dejan Markovic1,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Methods for True Power MinimizationMethods for True Power Minimization

    Robert W. Brodersen1, Mark A. Horowitz2, Dejan Markovic1, Borivoje Nikolic1 and Vladimir Stojanovic2

    1Berkeley Wireless Research Center, UC Berkeley 2Stanford University

  • Power limited operation

    Delay

    Unoptimized design

    Emax

    DmaxDmin

    Energy/op

    Emin

    Achieve the highest performance under the power cap

  • Power limited operation

    Delay

    Unoptimized design

    Var1

    Energy/op

    Designoptimizationcurves

    Emax

    DmaxDmin

    Emin

    Achieve the highest performance under the power cap

  • Power limited operation

    Delay

    Unoptimized design

    Var1

    Var2

    Energy/op

    Designoptimizationcurves

    Emax

    DmaxDmin

    Emin

    Achieve the highest performance under the power cap

  • Power limited operation

    Delay

    Unoptimized design

    Var1

    Var2

    Var1 + Var2

    Energy/op

    Designoptimizationcurves

    Emax

    DmaxDmin

    Emin

    How far away are we from the optimal solution?

  • Power limited operation

    Delay

    Unoptimized design

    Var1

    Var2

    Var1 + Var2

    Global

    Energy/op

    Designoptimizationcurves

    Emax

    DmaxDmin

    Emin

    Global optimum – best performance

  • Power limited operation

    Delay

    Unoptimized design

    Emax

    DmaxDmin

    Energy/op

    Emin

    Maximize throughput for given energy orMinimize energy for given throughput

  • Design optimization♦ There are many sets of parameters to adjust♦ Tuning variables

    Circuit(sizing, supply, threshold)

    Logic style(domino, pass-gate, )

    Block topology (adder: CLA, CSA, )

    Micro-architecture (parallel, pipelined)

    topology A

    topology BDelay

    Ener

    gy/o

    p

  • Design optimization♦ There are many sets of parameters to adjust♦ Tuning variables

    Circuit(sizing, supply, threshold)

    Logic style(domino, pass-gate, )

    Block topology (adder: CLA, CSA, )

    Micro-architecture (parallel, pipelined)

    Globally optimal boundary curve: pieces of E-D curves for different topologies

    topology A

    topology BDelay

    Ener

    gy/o

    p

  • Outline

    ♦ Circuit optimization♦ Joint optimization♦ Select the most promising sets of tuning variables

    ♦ Circuit & µArchitecture examples♦ Adder ♦ Add-Compare

    ♦ Conclusions

  • Energy-delay sensitivity

    ( )*dd dd

    dddd

    dd V V

    EV

    Sens VD

    V=

    ∂∂

    = −∂

    ♦ Proposed by Zyban at ISLPED02

    ♦ ∆E = Sens(A)(-∆D) + Sens(B)∆D

    At the optimal point,all sensitivities should be the same

  • Alpha-power based delay model

    ( ) dpard dd out

    pin indd on

    WK V WtW WV V α

    ⋅= ⋅ +

    ♦ Fitting parametersVon, αd, Kd

  • Alpha-power based delay model

    ( ) dpard dd out

    pin indd on

    WK V WtW WV V α

    ⋅= ⋅ +

    − heff

    ♦ Fitting parametersVon, αd, Kd

    ♦ Effective fanout, heff

  • Energy model

    ♦ Switching energy

    ( ) 20 1 ( ) ( )Sw out par ddE C W C W Vα →= ⋅ + ⋅

    ♦ Leakage energy( )

    00 ( )

    th ddV VV

    Lk in in ddE W I S e V Dγ− −

    = ⋅ ⋅ ⋅ ⋅

  • Sensitivity to sizing and supply♦ Gate sizing (Wi)

    ( ), , 1

    Swi i

    nom eff i eff ii

    EW ec

    D h hW τ −

    ∂∂

    − =∂ ⋅ −∂

    ∞ for equal heff (Dmin)

    ♦ Supply voltage (Vdd)

    12

    1

    Swdd Sw v

    d vdd

    EV E x

    D D xV α

    ∂∂ −

    − = ⋅∂ − +

    ∂ Vth Vdd

    Vddnom

    Sens(Vdd)

    0

    xv = (Von+∆Vth)/Vdd

  • Sensitivity to Vth

    ♦ Threshold voltage (Vth)

    0

    ( )1

    ( )

    th dd on thLk

    dth

    EV V V V

    PD VV α

    ∂ ∂ ∆ − − ∆

    − = ⋅ − ∂ ⋅ ∂ ∆

    0 Vth

    Vthnom

    Sens(Vth)

    Low initial leakage ⇒ speedup comes for “free”

  • Optimization setup

    ♦ Reference/nominal circuit sized for Dmin @ Vddnom, Vthnom known average activity

    ♦ Set delay constraint

    ♦ Minimize energy under delay constraint gate sizing Vdd , Vth scaling

  • Kogge-Stone tree adder topology

    S0

    S15

    (A0, B0)

    (A15, B15)

    ♦ Off-path load (gates + wires)♦ Reconvergence (inside ●-block)

  • Tree adder: Sizing optimization♦ Reference: all paths are critical

    Nominal (Dmin, Enom)

    Sizing opt. (1.1Dmin, 0.3Enom)

    Internal energy peaks ⇒Big savings for small delay penalty with resizing

  • Joint optimization: sizing and Vdd

    Nominal design

    minnomD

    ( , )nomddf W V( , )newddf W V

    ( )ddf VEn

    ergy

    /op

    Delay

    ∆E = Sens(Vdd)•(-∆D) + Sens(W)•∆D

  • Results of joint optimizationEn

    ergy

    [Eno

    m]

    Delay [Dnom]

    Nominal Design(Dnom, Enom)

    Energy efficient curve f(W,Vdd,Vth)

    (Dnom, Emin)

    80% of energy saved without delay penalty

    1 (reference)(Dnom, Emin)1.5Vdd

    0.2∞(Dnom, Enom)VthWSens

    Sensitivity table

  • Results of joint optimizationEn

    ergy

    [Eno

    m]

    Delay [Dnom]

    Nominal Design(Dnom, Enom)

    Energy efficient curve f(W,Vdd,Vth)

    (Dnom, Emin)

    (Dmin,Enom)

    80% of energy saved without delay penalty40% speedup for same energy

    22161 (reference)(Dnom, Emin)22(Dmin, Enom)

    1.5Vdd

    0.2∞(Dnom, Enom)VthWSens

    Sensitivity table

  • A look at tuning variablesSupply Threshold

    reliabilitylimit

    Sens(Vdd)=Sens(W)=Sens(Vth)=1

    Limited range of tuning variables

  • A look at tuning variablesSupply Threshold

    reliabilitylimit

    Sens(Vdd)=16Sens(Vth)=Sens(W)=22

    Sens(Vdd)=Sens(W)=Sens(Vth)=1

    Limited range of tuning variables

  • Reducing the number of dimensionsEn

    ergy

    [Eno

    m]

    Delay [Dnom]

    Threshold and sizing nearly optimal around the nominal point

  • Scope of circuit optimizationEn

    ergy

    [Eno

    m]

    Delay [Dnom]

    Effective region +/-30% around nominal delay

  • Circuit & µArchitecture optimization♦ Revisit the old argument for parallelism

    A B A B

    A BA B

    f f2f

    2f

    f

    f f f

    (a) reference

    (c) pipeline(b) parallel

    ♦ What happens if we can choose optimal Vdd and Vth for each design?

  • Balance of leakage and switching energy

    2

    ln

    Lk

    Sw Opt dtech

    avg

    EE L K

    α

    =

    Optimal designs have high leakage current

  • Conclusions♦ All design levels need to be optimized jointly

    ♦ Equal marginal costs ⇒ Energy-efficient design

    ♦ Peak performance is VERY power inefficient

    ♦ Todays designs are not leaky enough to be truly power-optimal

    ♦ Pipelining starts to gain advantage over parallelism